Introduction
In this article, I am going to demonstrate how to use the Dplyr package in R along with a plane dataset. We will be using various functions provided with the Dplyr package to manipulate and transform the data and to create a subset of data as well. The functions that we will be using are filter(), arrange(), and select().
Loading package and dataset
We will be using a predefined plane dataset that belongs to a package named nycflights13. Therefore we need to load the package first as follows.
Now we need to load the dataset planes as we will be using it to transform and manipulate the data.
The above code will generate the following output.
Now we need to load the library named Dplyr to use the functions of the package Dplyr. We can use the below syntax to load the Dplyr library.
The above code will generate the following output.
Filter() function to filter data on the basis of variable names
The filter function is used to select and display observations according to the values of the arguments included inside the function. The filter function creates subsets of observations. Inside the function, the data frame name is the first argument and the second argument is the name of the variable of the dataset whose values we will be using to filter the data frame.
Let us discuss this function with the help of the example below.
In the argument part of the filter function, we have mentioned year == 1999, therefore a new dataset containing all the observations of year 1999 will be displayed.
In the argument part of the filter function, we have mentioned the value of the year as 1999 and the value of seats as 55, therefore filtering operation will be executed by Dplyr library and a new dataset containing all the observations of 1999 and containing 55 seats will be displayed.
Using assignment operator
The result of the new subset of the dataset that has been created can also be saved in another variable using the assignment operator.
If we want to save the subset in a variable and print the result at the same time then we can include the entire assignment syntax inside parentheses as follows.
The above code will generate the following output.
The arrange function to arrange data in different orders
arrange() function is used to select and display rows either in ascending or descending order according to the order mentioned inside the arrange() function as an argument. It takes the names of the variables as arguments and displays data in two different orders.
The above code will generate the following output.
We can also use the desc() function as an argument inside the arrange function to display the rows and columns of the new data frame reordered in descending order.
Select() function to select columns of a table
The select function is used to display a subset of a dataset containing only those columns that are mentioned inside the select function as arguments. The select() function also displays data based on the conditions mentioned inside the select() function. These conditions are applied to the variables of the dataset loaded at the beginning.
Displaying columns by including column names as arguments. To select columns by names we can use the below syntax.
The above code will generate the following output.
To display all the columns between type and engines, we can use the below syntax.
The above code will generate the following output.
To display all the selected columns excluding those from type to engines, we can use the below syntax.
The above code will generate the following output.
Summary
In this article, I demonstrated how to use the Dplyr package in R along with the plane dataset. We have used various functions provided by the Dplyr package to manipulate and transform the data and to create a subset of data as well. Various functions such as filter(), arrange(), and select() are used. Proper coding snippets and outputs are also provided.