0

I am trying to gather untidy data from wide to long format. I have 748 variables, that need to be condensed to approximately 30.

In this post, I asked: how to tidy my wide data? The answer: use gather().

However, I am still struggling to gather multiple columns and was hoping you could pinpoint where I'm going wrong.

Reproducible example:

tb1 <- tribble(~x1,~x2,~x3,~y1,~y2,~y3,
       1,NA,NA,NA,1,NA,
       NA,1,NA,NA,NA,1,
       NA,NA,1,NA,NA,1)

# A tibble: 3 x 6
#     x1    x2    x3 y1       y2    y3
#  <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#1     1    NA    NA NA        1    NA
#2    NA     1    NA NA       NA     1
#3    NA    NA     1 NA       NA     1

with x1-y3 having the following characteristics:

1 x1    Green 
2 x2    Yellow
3 x3    Orange
4 y1    Yes   
5 y2    No    
6 y3    Maybe 

I tried this:

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  gather(Y,val,-X) %>%
  select(-val)

I did get an output that I wanted for these variables, but I can't imagine how to do this for 700+ variables?! Is there a more effective way?

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  filter(!is.na(val)) %>%
  select(-val) %>%
  gather(Y,val,-X) %>%
  filter(!is.na(val)) %>%
  select(-val)

# A tibble: 3 x 2
  X     Y     
  <chr> <chr> 
1 No    Green 
2 Maybe Yellow
3 Maybe Orange

I think I might be just not acquainted enough with gather() so this is probably a stupid question - would appreciate the help. Thanks!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
maria118code
  • 153
  • 1
  • 14

1 Answers1

1

I’m assuming the issue here is with manually specify all the different variable names. Luckily, tidyverse has the ?select_helpers which make it easier to select columns based on different rules.

Instead of renaming the variables at the beginning, we can rename them at the end. This lets us use starts_with to get all columns starting with x or y and gather them together in one step. Then we can use ends_with to select the value columns from those gather steps and filter and drop them.

Finally, we replace all values of x1, y1 etc. with their true values in one step using mutate_all and a lookup table

# Make lookup table to match X and Y variables with Values
  # the initial values should be the `names` (first) and the values to change them to
  # should be the `values` (after the =)
lookup <- c('x1' = 'Green',
            'x2' = 'Yellow',
            'x3' = 'Orange',
            'y1' = 'Yes',
            'y2' = 'No',
            'y3' = 'Maybe')

tb1 %>%
    gather(X, Xval, starts_with('x')) %>%    # Gather all variables that start with ‘x'
    gather(Y, Yval, starts_with('y')) %>%    # Gather all variables that start with ‘y'
    filter_at(vars(ends_with('val')),        # Looking in columns ending with ‘val'
              all_vars(!is.na(.))) %>% %>%    # Drop rows if ANY of these cols are NA
    select(-ends_with('val')) %>%            # Drop columns ending in ‘val'
    mutate_all(~lookup[.])                   # Replace value from lookup table in all cols

# A tibble: 3 x 2
  X      Y 
  <chr>  <chr>
1 Green  No   
2 Yellow Maybe
3 Orange Maybe

One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars. In gather and select, you can use them as is. In mutate, filter, summarize, etc. you need to surround them with vars

divibisan
  • 11,659
  • 11
  • 40
  • 58
  • Thank you for your help, this saves time. I figured out my gathering is still a little messy - apologies for the bad question, it is difficult to ask a question sometimes, especially when you don't know what is going wrong! Thanks a lot for your help : ) – maria118code May 08 '19 at 08:37