-1

I am new using R. I have two data frames (as below) and I would like to add the information from df2 in df1. The only column in common between both of data frames is "Sample". So I tried to use this column to merge both data frames.

df1

structure(list(Segment = c(3L, 3L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 
7L), Position = c(838L, 891L, 1204L, 732L, 1550L, 688L, 1167L, 
1446L, 950L, 981L), `AA-REF` = structure(c(2L, 5L, 7L, 6L, 1L, 
8L, 8L, 1L, 3L, 4L), .Label = c("", "D", "E", "H", "K", "L", 
"Q", "T"), class = "factor"), `AA-ALT` = structure(c(4L, 2L, 
2L, 3L, NA, 5L, 3L, NA, 1L, 4L), .Label = c("E", "K", "M", "N", 
"T"), class = "factor"), SYN = structure(c(2L, 3L, 2L, 2L, 1L, 
3L, 2L, 1L, 3L, 2L), .Label = c("     ", "N     ", "Y     "), class = "factor"), 
    Sample = c("AO103", "AO103", "AO103", "AO103", "AO103", "AO103", 
    "AO103", "AO103", "AO103", "AO103")), row.names = c(NA, 10L
), class = "data.frame")
  Segment Position AA-REF AA-ALT    SYN Sample
1         3      838      D      N N       AO103
2         3      891      K      K Y       AO103
3         3     1204      Q      K N       AO103
4         4      732      L      M N       AO103
5         5     1550          <NA>         AO103
6         6      688      T      T Y       AO103
7         6     1167      T      M N       AO103
8         6     1446          <NA>         AO103
9         7      950      E      E Y       AO103
10        7      981      H      N N       AO103
11        8      199      T      N N       AO103
12        1      341      T      K N       AO104
13        1      934      T      A N       AO104
14        1     1327      L      F N       AO104
15        1     1349      D      G N       AO104

df2

structure(list(Sample = c("AO208 ", "AO209 ", "AO210 ", "AO211 ", 
"AO212 ", "AO213 ", "AO100 ", "AO101 ", "AO102 ", "AO103 "), 
    Quail = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), day = c(3, 3, 
    3, 3, 3, 3, 5, 5, 5, 5), Expo = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = " DC ", class = "factor"), 
    Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = " var", class = "factor")), row.names = c(NA, 
10L), class = "data.frame")
 Sample Quail day Expo Group
1  AO208      7   3  DC    var
2  AO209      8   3  DC    var
3  AO210      9   3  DC    var
4  AO211     10   3  DC    var
5  AO212     11   3  DC    var
6  AO213     12   3  DC    var
7  AO100      7   5  DC    var
8  AO101      8   5  DC    var
9  AO102      9   5  DC    var
10 AO103     10   5  DC    var
11 AO104     11   5  DC    var

NOTE: Not all entries in df2$Sample are present in df1$Sample

I would like to get something like the following:

  Segment Position AA-REF AA-ALT    SYN Sample    Quail   day    Expo    Group
1         3      838      D      N N       AO103    10   5  DC    var
2         3      891      K      K Y       AO103    10   5  DC    var
3         3     1204      Q      K N       AO103    10   5  DC    var
4         4      732      L      M N       AO103    10   5  DC    var
5         5     1550          <NA>         AO103    10   5  DC    var
6         6      688      T      T Y       AO103    10   5  DC    var
7         6     1167      T      M N       AO103    10   5  DC    var
8         6     1446          <NA>         AO103    10   5  DC    var
9         7      950      E      E Y       AO103    10   5  DC    var
10        7      981      H      N N       AO103    10   5  DC    var
11        8      199      T      N N       AO103    10   5  DC    var
12        1      341      T      K N       AO104    11   5  DC    var
13        1      934      T      A N       AO104    11   5  DC    var
14        1     1327      L      F N       AO104    11   5  DC    var
15        1     1349      D      G N       AO104    11   5  DC    var

I tried:

x <- merge(df1, df2, by = "Sample", all = TRUE)

Even though this is adding the columns, everything from df2 is placed at the end of the df1.

I also tried using dplyr's left_join (among others) as:

x <- df1 %>%
  left_join(df2, by = "Sample")

This adds empty columns from df2 and no information at all.

I have been looking at many merging posts but none of those seem to address my problem.

I also tried match without success.

pppery
  • 3,731
  • 22
  • 33
  • 46
genferreri
  • 123
  • 7
  • 2
    Just do `x<-left_join(df1,df2)` as long as the column name "Sample" is the same it will match them for you. – Tanner33 Feb 07 '20 at 21:32
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It it possible the sample values are formatted differently in df1 and df2? What does `sum(df2$Sample %in% df1$Sample)` return? – MrFlick Feb 07 '20 at 21:33
  • Thanks @Tanner33. This does not work either. This is the first time that I have problems trying to merge something. I cannot see the error. – genferreri Feb 07 '20 at 23:11
  • 1
    Thanks @MrFlick. I hope that my update gives you something better to work with. Values are in both as.characters; `sum(df2$Sample %in% df1$Sample)` returns 0. – genferreri Feb 07 '20 at 23:14
  • Does `df1[15,"Sample"]==df2[11,"Sample"]` return true or false? They are equal in your examples so they should return true, but based off of @MrFlick question r isn't recognizing them as the same thing. Also try `str(df1[15,"Sample"])` and `str(df2[11,"Sample"])` to see if the structure is different for some reason. – Tanner33 Feb 07 '20 at 23:24
  • I think there is a space after all of the Sample names in df2. Run this `df2$Sample<-substr(df2$Sample,1,nchar(df2$Sample)-1)` to remove the last character and then try to merge it. – Tanner33 Feb 07 '20 at 23:27
  • Thanks @Tanner33. `df1[15,"Sample"]==df2[11,"Sample"]` returns false. What would be 15 and 11 in this case? I also tried srt() as you suggested and it returns AO104. – genferreri Feb 07 '20 at 23:31
  • 1
    There was indeed a space there!!! Thank you very much I was able to merge them!!! Thank you very much for taking the time! – genferreri Feb 07 '20 at 23:35
  • 2
    @Tanner33, for removing surrounding whitespace, I recommend to use `trimws()` or `stringr::str_trim()`. This is much safer than to remove the last character unconditionally. – Uwe Feb 08 '20 at 22:18
  • Good idea @Uwe in this case it looked like the space had been uniformly added to all rows in the dataset, but better safe than sorry. – Tanner33 Feb 09 '20 at 21:43

1 Answers1

1
x <- merge(x=df1, y=df2, by = "Sample", all.x = TRUE)

You only want all of the columns from df1, so you only need all.x.

Shout out to Tanner33 if you want to use dplyr or tidyverse packages.

dyrland
  • 608
  • 1
  • 7
  • 17