Quickly combine two similar variables in R

Question

Apologies for the noob question.

I have a large number of variables (124) and have been given an additional dataset of different values of similar variables (again, 124).

Each variable in the first set has a pair in the second set with the same name with the appendage 'N' at the end

I.E Wildfirefuture49 & Wildfirefuture49N

I need to combine the two sets of data so that each variable from the second set is added to it's pair from the first set and due to the size of the dataset, excel isn't up to the challenge... and I should probably learn how to in R anyway. Is there an easy way to do this?

I have data that looks like this

I need it to look like this, with all the 'N' variables added to the existing variables.

Thank you.

@jared_mamrot I don't think so. I think they want to sum the 2 dataframes: `Var1 = Var1 + Var1N; Var2 = Var2+Var2N; etc.` — divibisan, Jan 18 '22 at 23:45
Thanks for editing your question to clarify your expected outcome @kiwi_coding_economist, but in future please don't use images to illustrate your data/code (see: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question)). [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) has more tips on what to do/what not to do in order to increase your chances of getting an answer on stackoverflow — jared_mamrot, Jan 19 '22 at 00:19
Hi @jared_mamrot. Thanks for that, I'll keep that in mind :) — kiwi_coding_economist, Jan 26 '22 at 03:31

divibisan · Answer 1 · 2022-01-19T00:44:13.263

Here are 2 solutions:

a tidyverse solution which looks a bit more complex than simply adding the 2 dataframes together, but is safer since it will matches Var name and ID before adding.
a base R solution which is simple, but requires the order and variable names to match precisely

library(tidyverse)
set.seed(1)

df <- data.frame(ID = seq.int(1,10),
                 V1=runif(10, 0, 100),
                 V2=runif(10, 0, 100),
                 V1N=runif(10, 0, 100),
                 V2N=runif(10, 0, 100))

   ID        V1       V2       V1N      V2N
1   1 26.550866 20.59746 93.470523 48.20801
2   2 37.212390 17.65568 21.214252 59.95658
3   3 57.285336 68.70228 65.167377 49.35413
4   4 90.820779 38.41037 12.555510 18.62176
5   5 20.168193 76.98414 26.722067 82.73733
6   6 89.838968 49.76992 38.611409 66.84667
7   7 94.467527 71.76185  1.339033 79.42399
8   8 66.079779 99.19061 38.238796 10.79436
9   9 62.911404 38.00352 86.969085 72.37109
10 10  6.178627 77.74452 34.034900 41.12744

df %>%
    pivot_longer(-ID) %>%                     # Pivot data to long form
    mutate(name = gsub('N$', '', name)) %>%   # Remove 'N' so matching Vars will combine
    group_by(ID, name) %>%                    # Group identical Vars and IDs
    summarise(value = sum(value)) %>%         # Sum up these matching values
    pivot_wider()                             # Pivot wide with 1 column per Var

      ID    V1    V2
   <int> <dbl> <dbl>
 1     1 120.   68.8
 2     2  58.4  77.6
 3     3 122.  118. 
 4     4 103.   57.0
 5     5  46.9 160. 
 6     6 128.  117. 
 7     7  95.8 151. 
 8     8 104.  110. 
 9     9 150.  110. 
10    10  40.2 119.

If you can ensure that the dimensions and ordering of the 2 tables are identical, you can do this simply in baseR with +:

# Split the tables if necessary:
rownames(df) <- df$ID   # Preserve ID as rownames
df1 <- df[,2:3]         # Select first DF columns (dropping ID)
df2 <- df[,4:5]         # Select second DF columns (dropping ID)

# Now, you can just add the data.frames together
df1 + df2
          V1        V2
1  120.02139  68.80547
2   58.42664  77.61226
3  122.45271 118.05642
4  103.37629  57.03213
5   46.89026 159.72147
6  128.45038 116.61660
7   95.80656 151.18584
8  104.31857 109.98497
9  149.88049 110.37461
10  40.21353 118.87197

Base R can be safe and concise at the same time... `nms <- names(df)[endsWith(names(df), "N")]; data.frame(df["ID"], df[sub("N$", "", nms)] + df[nms])` — Mikael Jagan, Jan 19 '22 at 00:39
@MikaelJagan Hm, not sure what's up with that, but you're right. Updated the numbers — divibisan, Jan 19 '22 at 00:44
Hi. Thank you @divibisan. After splitting the data into 10 and runnign a loop to avoid memory constraint issues this worked perfectly. Thank you — kiwi_coding_economist, Jan 26 '22 at 03:30

Quickly combine two similar variables in R

1 Answers1