-3

I have a data frame with 200 columns: A_1, ..., A_100, B_1, ..., B_100. The entries of A are integers from 1 to 5 or NA, while the entries of B are -1, 0, 1, NA.

I want to append 100 more columns: C_1, ..., C_100 where C_i = A_i + B_i, except when it would yield 0 or 6, in which case it should stay as is.

What would be the best way to do this in R, in terms of clarity and computational complexity? There has to be a better way than a for loop or something like that, perhaps there are functions for this in some library? I'm going to have to do similar operations a lot so I'd like a streamlined method.

Spine Feast
  • 235
  • 1
  • 11
  • 2
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Mar 05 '18 at 14:52

1 Answers1

1

You can try:

library(tidyverse)
# some data
d <- data.frame(A_1=1:10, 
           A_2=1:10, 
           A_3=1:10, 
           B_1=11:20, 
           B_2=21:30, 
           B_3=31:40) 
d %>% 
  gather(key, value) %>% 
     separate(key, into = c("a","b")) %>% 
     group_by(b, a) %>% 
     mutate(n=row_number())  %>% 
     unite(a2,b, n) %>% 
     spread(a, value) %>%  
     mutate(Sum=A+B) %>% 
     separate(a2, into = c("a", "b"), remove = T) %>% 
     select(-A,-B) %>% 
     mutate(a=paste0("C_",a)) %>% 
     spread(a, Sum) %>% 
     arrange(as.numeric(b)) %>% 
     left_join(d %>% rownames_to_column(), by=c("b"="rowname"))
# A tibble: 10 x 10
       b   C_1   C_2   C_3   A_1   A_2   A_3   B_1   B_2   B_3
   <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1     1    12    22    32     1     1     1    11    21    31
 2     2    14    24    34     2     2     2    12    22    32
 3     3    16    26    36     3     3     3    13    23    33
 4     4    18    28    38     4     4     4    14    24    34
 5     5    20    30    40     5     5     5    15    25    35
 6     6    22    32    42     6     6     6    16    26    36
 7     7    24    34    44     7     7     7    17    27    37
 8     8    26    36    46     8     8     8    18    28    38
 9     9    28    38    48     9     9     9    19    29    39
10    10    30    40    50    10    10    10    20    30    40

The idea is to use tidyr's gather and spread to get the columns A and B side by side. Then you can calculate the sum and transform it back to the expected data.frame. As long your data.frame has the same number of A and B columns, it is working.

Roman
  • 17,008
  • 3
  • 36
  • 49