0

I have two data frames of equal number of rows (639) but differing column lengths (DF1: 5, DF2: 2500), the rows in DF1 correspond to the rows in DF2.

Some rows in DF2 will be removed due to several NAs, but I have no information on which ones are removed. cbind() does not allow me to bind the DFs together due to the now differing row lengths. However, I also need the rows to correspond, so if row 47 is removed in DF2, it must also be removed in DF1 upon merge. My assumption is that there can be some workaround with row.names but I am not sure how to execute it. Help would be appreciated. Examples of DFs below:

DF1:

    pp trialNo  item trialTarget trial
1 pp01       1  M012      script     1
2 pp01       2 BS016      script     2
3 pp01       3  M007      script     3
4 pp01       4 BS010      script     4
5 pp01       5  M006      script     5
6 pp01       6 BS018      script     6 

DF2:

    V1  V2  V3  V4  V5  V6
1: 764 764 763 763 762 763
2: 714 714 711 708 705 704
3: 872 871 869 867 867 871
4: 730 728 727 724 722 719
5: 789 786 788 790 792 790
6: 922 923 928 933 938 938

And assuming row 3 in DF2 is removed, I would expect this after binding:

    pp trialNo  item trialTarget trial  V1  V2  V3  V4  V5  V6
1 pp01       1  M012      script     1 764 764 763 763 762 763
2 pp01       2 BS016      script     2 714 714 711 708 705 704
4 pp01       4 BS010      script     4 730 728 727 724 722 719
5 pp01       5  M006      script     5 789 786 788 790 792 790
6 pp01       6 BS018      script     6 922 923 928 933 938 938

Thanks in advance.

NickB
  • 103
  • 5

1 Answers1

0

You could create a row index in each of the dataframe.

df1$row <- 1:nrow(df1)
df2$row <- 1:nrow(df2)

Then remove 3rd row in df2.

df2 <- df2[-3, ]

You can then merge by row column both the dataframes.

merge(df1, df2, by = 'row')

#  row   pp trialNo  item trialTarget trial  V1  V2  V3  V4  V5  V6
#1   1 pp01       1  M012      script     1 764 764 763 763 762 763
#2   2 pp01       2 BS016      script     2 714 714 711 708 705 704
#3   4 pp01       4 BS010      script     4 730 728 727 724 722 719
#4   5 pp01       5  M006      script     5 789 786 788 790 792 790
#5   6 pp01       6 BS018      script     6 922 923 928 933 938 938

data

df1 <- structure(list(pp = c("pp01", "pp01", "pp01", "pp01", "pp01", 
"pp01"), trialNo = 1:6, item = c("M012", "BS016", "M007", "BS010", 
"M006", "BS018"), trialTarget = c("script", "script", "script", 
"script", "script", "script"), trial = 1:6, row = 1:6), row.names = c(NA, 
-6L), class = "data.frame")

df2 <- structure(list(V1 = c(764L, 714L, 872L, 730L, 789L, 922L), V2 = c(764L, 
714L, 871L, 728L, 786L, 923L), V3 = c(763L, 711L, 869L, 727L, 
788L, 928L), V4 = c(763L, 708L, 867L, 724L, 790L, 933L), V5 = c(762L, 
705L, 867L, 722L, 792L, 938L), V6 = c(763L, 704L, 871L, 719L, 
790L, 938L)), class = "data.frame", row.names = c(NA, -6L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for your response Ronak. I've clarified my question a bit by adding the fact do I do not have any track of which rows are removed in DF2. I used row 3 as an example, but in reality, there are several rows out of the 639 that are removed. – NickB May 29 '20 at 09:14
  • You cannot modify the two dataframes by adding a row index as I have added before the rows are removed? – Ronak Shah May 29 '20 at 09:18
  • I was adding the row index to the wrong data frame in the cleaning process, your solution has worked for me. 8 rows were removed in the cleaning process and the merged dataframe now has 631 rows, with the correct correspondence. Thank you! – NickB May 29 '20 at 09:26