0

I have two data sets (csv files i.e. 1000 columns and 200 columns) and I want to dynamically add the some of column names of the second data set to my first data set, based on the the exactness of two column elements in both data set. (Confused: Look at the expected and actual result expected as below)

I tried putting in two for loops to compare the element by element. As soon as column element are matched (matching criteria), pick and drop that element into a vector. Repeat the process for all the elements in the inner for loop --> once the inner for loop is completed, transpose the vector and convert it to column vector--> add the column to that particular row based on the outer for loop row value

     df_1$Res_1 <- NA
     df_1$Res_1 <- as.character(df_1$Res_1)
     df_1_1 <- rep(NA,10)
     df_1_1 <- as.character(df_1_1)  
     c = 1
     for (p in 1:nrow(df_1)){
          for(q in 1:nrow(df_2)){
              if((df_1$Mat_1[p] == df_2$Mat_2[q]) {  
                 df_1_1[c] <- df_2$Res[q]
                 c = c+1
                                                  }
                 df_1_1 <- t(df_1_1)
                 df_1$Res_1[p,] <- df_1_1[]  #not sure how to add the remaining            #columns like Res_2 and Res_3 etc 
                                }
                 c = 1
             }

(link to drive.google.com)

Error is

df_1$Res_1[p,] <- df_1_1[]:
incorrect number of subscripts on matrix

(link to docs.google.com/spreadsheets)

r2evans
  • 141,215
  • 6
  • 77
  • 149
harshbits
  • 3
  • 2
  • 5
    Pictures of data are hard to work with, please post copy/pasteable sample data in valid R syntax, either using `dput()` or sharing code to simulate the data. – Gregor Thomas Jul 15 '19 at 20:22
  • Also, assuming I've guessed correctly, I'd ask you to try to be more specific in your next question. *"I want to dynamically add the some of column names of the second data set to my first data set, based on the the exactness of two column elements in both data set."* was very hard for me to understand. A clearer rephrasing: *"I want to add the values from `df2$Res` to `df1` when `df2$Mat_2` values match the `df1$Mat_1` values. I want the matching `df2$Res` from multiple rows to be added in a single row in `df1`, creating as many new columns as needed.*" Using specific names is much better. – Gregor Thomas Jul 15 '19 at 21:28
  • Hi @Gregor, Thanks a lot for the solution. I was stuck on this from like a month. I want to add column corresponding to Res to the output file. I have updated the google like to explain better. Please look at the updated output file and kindly suggest modification to the code. [link](https://docs.google.com/spreadsheets/d/1L92Rp4UzZv__rNt-wr8iCmXcKUNW2wFNv12tshfNr1A/edit?usp=sharing) I am thinking to make this change: select(Mat_2, Res, col_1, group_no) – harshbits Jul 16 '19 at 01:45
  • Stack Overflow is about helping lots of people - not just one. To that end, we ask that questions are as self-contained as possible. We want future readers to be able to read the question, see if it pertains to them, and if so be able to look at the answer. So rather than linking out to google spreadsheets that both hide the problem behind a link and also might become dead links at any time, please share your data in the question itself - like I started to at the bottom of my answer. – Gregor Thomas Jul 16 '19 at 13:02
  • That said, since I've already answered the question as originally asked, I'd further suggest that you start a new question when you introduce new requirements. I think you might be able to work out the changes yourself with [this question as a reference](https://stackoverflow.com/q/12466493/903061). The big change you need to make is modifying my `melt` command to work on multiple sets of columns. The `data.table::melt` version does that, as does `tidyr::pivot_longer`, I think. – Gregor Thomas Jul 16 '19 at 13:05

1 Answers1

0

Near as I can tell, this is what you want:

library(reshape2)
library(dplyr)

df3 = df2 %>%
  group_by(Mat_2) %>%
  mutate(group_no = row_number()) %>%
  select(Mat_2, Res, group_no) %>%
  melt(id.vars = c("Mat_2", "group_no")) %>%
  mutate(variable = paste(variable, group_no, sep = "_")) %>%
  select(-group_no) %>%
  dcast(Mat_2 ~ variable, data = .)


left_join(df1, df3, by = c("Mat_1" = "Mat_2"))
#    Mat_1 Col_1 Col_2 Col_3 Res_1 Res_2 Res_3
# 1      A     d     d     d    QQ    TT    YY
# 2      B     o     o     o    RR    TT  <NA>
# 3      C     e     e     e    QQ  <NA>  <NA>
# 4      A     n     n     n    QQ    TT    YY
# 5      E     o     o     o    FF  <NA>  <NA>
# 6      C     t     t     t    QQ  <NA>  <NA>
# 7      G     m     m     m    QQ  <NA>  <NA>
# 8      B     a     a     a    RR    TT  <NA>
# 9      A     t     t     t    QQ    TT    YY
# 10     X     t     t     t  <NA>  <NA>  <NA>
# 11     R     e     e     e    YY  <NA>  <NA>
# 12     A     r     r     r    QQ    TT    YY
# Warning message:
# Column `Mat_1`/`Mat_2` joining factors with different levels, coercing to character vector 

Using this data:

df1 = read.table(text = 'Mat_1  Col_1   Col_2   Col_3
A   d   d   d
B   o   o   o
C   e   e   e
A   n   n   n
E   o   o   o
C   t   t   t
G   m   m   m
B   a   a   a
A   t   t   t
X   t   t   t
R   e   e   e
A   r   r   r', header = T)

df2 = read.table(text = 'ID Mat_2   Res Col_1   Col_2   Col_3
1   A   QQ  i   i   i
2   C   QQ  t   t   t
3   V   EE  r   r   r
4   B   RR  e   e   e
5   G   QQ  a   a   a
6   A   TT  l   l   l
7   E   FF  l   l   l
8   B   TT  y   y   y
9   A   YY  d   d   d
10  P   RR  s   o   o
11  O   EE  e   e   e
12  O   TT  n   n   n
13  R   YY  o   o   o
14  T   UU  t   t   t
15  M   OO  m   m   m
16  M   ZZ  a   a   a
17  N   VV  t   t   t
18  J   AA  t   t   t
19  K   SS  e   e   e
20  L   EE  r   r   r', header = T)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hi @Gregor, Thanks a lot for the solution. I was stuck on this from like a month. I want to add column corresponding to Res to the output file. I have updated the google like to explain better. Please look at the updated output file and kindly suggest modification to the code. [link](https://docs.google.com/spreadsheets/d/1L92Rp4UzZv__rNt-wr8iCmXcKUNW2wFNv12tshfNr1A/edit?usp=sharing) I am thinking to make this change: select(Mat_2, Res, col_1, group_no) – harshbits Jul 16 '19 at 01:50