[New post] More Selecting and Transforming with dplyR
Dr. Darrin posted: " In this post, we are going to learn some more advance ways to work with functions in the dplyr package. Let's load our libraries library(dplyr) library(gapminder) Our dataset is the gapminder dataset which provides information about countries and c" educational research techniques
In this post, we are going to learn some more advance ways to work with functions in the dplyr package. Let's load our libraries
library(dplyr) library(gapminder)
Our dataset is the gapminder dataset which provides information about countries and continents related to gdp, life expectancy, and population. Here is what the data looks like as a refresher.
You can see that by using the colon we were able to select the last three columns.
There are also arguments called "select helpers." Select helpers help you find columns in really large data sets. For example, let's say we want columns that contain the string "life" in them. To find this we would use the contain argument as shown below.
Only the column that contains the string life is selected. There are other help selectors that you can try on your own such as starts_with, ends_with and more.
To remove a variable from a dataset you simply need to put a minus sign in front of it as shown below.
gapminder %>% select(-lifeExp, -gdpPercap)
## # A tibble: 1,704 x 4 ## country continent year pop ## <fct> <fct> <int> <int> ## 1 Afghanistan Asia 1952 8425333 ## 2 Afghanistan Asia 1957 9240934 ## 3 Afghanistan Asia 1962 10267083 ## 4 Afghanistan Asia 1967 11537966 ## 5 Afghanistan Asia 1972 13079460 ## 6 Afghanistan Asia 1977 14880372 ## 7 Afghanistan Asia 1982 12881816 ## 8 Afghanistan Asia 1987 13867957 ## 9 Afghanistan Asia 1992 16317921 ## 10 Afghanistan Asia 1997 22227415 ## # … with 1,694 more rows
In the output above you can see that life expectancy and per capa GDP are missing.
rename
Another function is the rename function which allows you to rename a variable. Below is an example in which the variable "pop" is renamed "population."
You can see that the "pop" variable has been renamed. Remember that the new name goes on the left of the equal sign while the old name goes on the right of the equal sign.
There is a shortcut to this and it involves renaming variables inside the select function. In the example below, we rename the pop variable population inside the select function.
The transmute function allows you to select and mutate variables at the same time. For example, let's say that we want to know total gdp we could find this by multplying the population by gdp per capa. This is done with the transmute function below.
gapminder %>% transmute(country, year, total_gdp = pop * gdpPercap)
With these basic tools it is now a little easier to do some data analysis when using R. There is so much more than can be learned but this will have to wait for the future.
No comments:
Post a Comment