1

I'm trying to convert from wide to long format using tidyr to gather multiple columns with different types. Essentially the same thing as the question asked here. I'm new to R and not familiar with the syntax so may be making some obvious mistake.

My data looks like this:

ID    X_1_abc  X_1_xyz    X_2_abc    X_2_xyz   X_3_abc   X_3_xyz
1       1        1          2          2         2         1
2       1        2          1          0         1         NA 
3       1        2          1          1         NA        0

I've tried the following code:

df %<>% gather(var, val, X_1_abc:X_3_xyz) %>%   
  separate(var, c('var', 'X_number'), sep = 'X_*_', convert = TRUE) %>% 
  spread(var, val, convert = TRUE) 

but this gives me just two columns: one listing the variables and one listing the values.

ID  X_num `<NA>`
1   1_abc  1     
1   1_xyz  1     
1   2_abc  2     
1   2_xyz  2     
1   3_abc  2     
1   3_xyz  1 

I'm looking for the following:

ID   X_num   abc   xyz
1     1       1     1    
1     2       2     2
1     3       2     1
AstraOK
  • 49
  • 9
  • Take your output, then `mutate(ID = parse_number(X_num), col = str_sub(X_num, end = -3)) %>% spread(col, ``) – Jon Spring Apr 09 '19 at 23:42
  • I don't think this will work because for the full data set not all of the variables have a 3 letter suffix. Right now if I try, I get the following error: `Error: `var` must evaluate to a single number or a column name, not a function`. – AstraOK Apr 10 '19 at 02:04
  • Good point. There's probably a better way with regex, but Mauritz' answer with `tidyr::separate` is cleaner. – Jon Spring Apr 10 '19 at 02:51

1 Answers1

2

I assume your expected output is incomplete as I don't see any entries for ID = 2 and ID = 3.

You could do the following

df %>%
    gather(k, v, -ID) %>%
    separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
    select(-tmp) %>%
    spread(ss, v)
#  ID X_num abc xyz
#1  1     1   1   1
#2  1     2   2   2
#3  1     3   2   1
#4  2     1   1   2
#5  2     2   1   0
#6  2     3   1  NA
#7  3     1   1   2
#8  3     2   1   1
#9  3     3  NA   0
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    If you want to drop a part of the split variable, you can assign it to `NA`: doing `into = c(NA, "X_num", "ss")` will save you the need to drop `tmp` laterr – divibisan Apr 10 '19 at 00:28
  • 1
    Yes, the output was just to give the idea, but this worked perfectly. I took @divibisan 's suggestion as well. Thank you! – AstraOK Apr 10 '19 at 02:49