RelationDigest

Thursday, 26 May 2022

[New post] More Selecting and Transforming with dplyR

Site logo image Dr. Darrin posted: " In this post, we are going to learn some more advance ways to work with functions in the dplyr package. Let's load our libraries library(dplyr) library(gapminder) Our dataset is the gapminder dataset which provides information about countries and c" educational research techniques

More Selecting and Transforming with dplyR

Dr. Darrin

May 27

In this post, we are going to learn some more advance ways to work with functions in the dplyr package. Let's load our libraries

library(dplyr) library(gapminder)

Our dataset is the gapminder dataset which provides information about countries and continents related to gdp, life expectancy, and population. Here is what the data looks like as a refresher.

glimpse(gapminder)
## Rows: 1,704 ## Columns: 6 ## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", … ## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, … ## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, … ## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8… ## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12… ## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

select

You can use the colon symbol to select multiple columns at once. Doing this is a great way to save time when selecting variables.

gapminder%>%         select(lifeExp:gdpPercap)
## # A tibble: 1,704 x 3 ##    lifeExp      pop gdpPercap ##      <dbl>    <int>     <dbl> ##  1    28.8  8425333      779. ##  2    30.3  9240934      821. ##  3    32.0 10267083      853. ##  4    34.0 11537966      836. ##  5    36.1 13079460      740. ##  6    38.4 14880372      786. ##  7    39.9 12881816      978. ##  8    40.8 13867957      852. ##  9    41.7 16317921      649. ## 10    41.8 22227415      635. ## # … with 1,694 more rows

You can see that by using the colon we were able to select the last three columns.

There are also arguments called "select helpers." Select helpers help you find columns in really large data sets. For example, let's say we want columns that contain the string "life" in them. To find this we would use the contain argument as shown below.

gapminder%>%         select(contains('life'))
## # A tibble: 1,704 x 1 ##    lifeExp ##      <dbl> ##  1    28.8 ##  2    30.3 ##  3    32.0 ##  4    34.0 ##  5    36.1 ##  6    38.4 ##  7    39.9 ##  8    40.8 ##  9    41.7 ## 10    41.8 ## # … with 1,694 more rows

Only the column that contains the string life is selected. There are other help selectors that you can try on your own such as starts_with, ends_with and more.

To remove a variable from a dataset you simply need to put a minus sign in front of it as shown below.

gapminder %>%         select(-lifeExp, -gdpPercap)
## # A tibble: 1,704 x 4 ##    country     continent  year      pop ##    <fct>       <fct>     <int>    <int> ##  1 Afghanistan Asia       1952  8425333 ##  2 Afghanistan Asia       1957  9240934 ##  3 Afghanistan Asia       1962 10267083 ##  4 Afghanistan Asia       1967 11537966 ##  5 Afghanistan Asia       1972 13079460 ##  6 Afghanistan Asia       1977 14880372 ##  7 Afghanistan Asia       1982 12881816 ##  8 Afghanistan Asia       1987 13867957 ##  9 Afghanistan Asia       1992 16317921 ## 10 Afghanistan Asia       1997 22227415 ## # … with 1,694 more rows

In the output above you can see that life expectancy and per capa GDP are missing.

rename

Another function is the rename function which allows you to rename a variable. Below is an example in which the variable "pop" is renamed "population."

gapminder %>%         select(country, year, pop) %>%         rename(population=pop)
## # A tibble: 1,704 x 3 ##    country      year population ##    <fct>       <int>      <int> ##  1 Afghanistan  1952    8425333 ##  2 Afghanistan  1957    9240934 ##  3 Afghanistan  1962   10267083 ##  4 Afghanistan  1967   11537966 ##  5 Afghanistan  1972   13079460 ##  6 Afghanistan  1977   14880372 ##  7 Afghanistan  1982   12881816 ##  8 Afghanistan  1987   13867957 ##  9 Afghanistan  1992   16317921 ## 10 Afghanistan  1997   22227415 ## # … with 1,694 more rows

You can see that the "pop" variable has been renamed. Remember that the new name goes on the left of the equal sign while the old name goes on the right of the equal sign.

There is a shortcut to this and it involves renaming variables inside the select function. In the example below, we rename the pop variable population inside the select function.

gapminder %>%         select(country, year, population=pop)
## # A tibble: 1,704 x 3 ##    country      year population ##    <fct>       <int>      <int> ##  1 Afghanistan  1952    8425333 ##  2 Afghanistan  1957    9240934 ##  3 Afghanistan  1962   10267083 ##  4 Afghanistan  1967   11537966 ##  5 Afghanistan  1972   13079460 ##  6 Afghanistan  1977   14880372 ##  7 Afghanistan  1982   12881816 ##  8 Afghanistan  1987   13867957 ##  9 Afghanistan  1992   16317921 ## 10 Afghanistan  1997   22227415 ## # … with 1,694 more rows

transmute

The transmute function allows you to select and mutate variables at the same time. For example, let's say that we want to know total gdp we could find this by multplying the population by gdp per capa. This is done with the transmute function below.

gapminder %>%         transmute(country, year, total_gdp = pop * gdpPercap)
## # A tibble: 1,704 x 3 ##    country      year    total_gdp ##    <fct>       <int>        <dbl> ##  1 Afghanistan  1952  6567086330. ##  2 Afghanistan  1957  7585448670. ##  3 Afghanistan  1962  8758855797. ##  4 Afghanistan  1967  9648014150. ##  5 Afghanistan  1972  9678553274. ##  6 Afghanistan  1977 11697659231. ##  7 Afghanistan  1982 12598563401. ##  8 Afghanistan  1987 11820990309. ##  9 Afghanistan  1992 10595901589. ## 10 Afghanistan  1997 14121995875. ## # … with 1,694 more rows
ad

Conclusion

With these basic tools it is now a little easier to do some data analysis when using R. There is so much more than can be learned but this will have to wait for the future.

Comment

Unsubscribe to no longer receive posts from educational research techniques.
Change your email settings at manage subscriptions.

Trouble clicking? Copy and paste this URL into your browser:
https://educationalresearchtechniques.com/2022/05/27/more-selecting-and-transforming-with-dplyr/

Powered by WordPress.com
Download on the App Store Get it on Google Play
at May 26, 2022
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest

No comments:

Post a Comment

Newer Post Older Post Home
Subscribe to: Post Comments (Atom)

Don’t Play the Problem’s Game

Listen now (14 mins) | This chapter explains how intermittent reinforcement schedules make it extremely difficult to change problematic beha...

  • [New post] Wiggle Kingdom: April Earnings on Spring Savings!
    Betsi...
  • [New post] Balancing the ‘E’ and ‘S’ in Environment, Social and Governance (ESG) crucial to sustaining liquidity and resilience in the African loan market (By Miranda Abraham)
    APO p...
  • Something plus something else
    Read on bl...

Search This Blog

  • Home

About Me

RelationDigest
View my complete profile

Report Abuse

Blog Archive

  • August 2025 (9)
  • July 2025 (59)
  • June 2025 (53)
  • May 2025 (47)
  • April 2025 (42)
  • March 2025 (30)
  • February 2025 (27)
  • January 2025 (30)
  • December 2024 (37)
  • November 2024 (31)
  • October 2024 (28)
  • September 2024 (28)
  • August 2024 (2729)
  • July 2024 (3249)
  • June 2024 (3152)
  • May 2024 (3259)
  • April 2024 (3151)
  • March 2024 (3258)
  • February 2024 (3046)
  • January 2024 (3258)
  • December 2023 (3270)
  • November 2023 (3183)
  • October 2023 (3243)
  • September 2023 (3151)
  • August 2023 (3241)
  • July 2023 (3237)
  • June 2023 (3135)
  • May 2023 (3212)
  • April 2023 (3093)
  • March 2023 (3187)
  • February 2023 (2865)
  • January 2023 (3209)
  • December 2022 (3229)
  • November 2022 (3079)
  • October 2022 (3086)
  • September 2022 (2791)
  • August 2022 (2964)
  • July 2022 (3157)
  • June 2022 (2925)
  • May 2022 (2893)
  • April 2022 (3049)
  • March 2022 (2919)
  • February 2022 (2104)
  • January 2022 (2284)
  • December 2021 (2481)
  • November 2021 (3146)
  • October 2021 (1048)
Powered by Blogger.