How to move data from one dataframe to a second dataframe

Question

I apologize if this is a duplicated question. I tried to find my question but I may not be using the right terminology. Feel free to change the title of this post if there is a better way to ask this question.

I have two dataframes

df <- data.frame("Location" = c("chr1:123", "chr6:2452", "chr8:4352", "chr11:8754", "chr3:76345", "chr7:23454","chr18:23452"),
"Score" = c("tolered(1)", "tolerated(2)", "", "", "deleterious(0.1)", "", "deleterious(0.2)"))

df2 <- data.frame("Location" = c( "chr7:23454", "chr9:243256", "chr8:4352", "chr2:6795452", "chr11:8754","chr18:23452", "chr3:76345"),
                 "Score" = c("", "", "", "", "", "", ""))

df has locations and values in the "score" column that I want to keep.
df2 has the data from df plus some new data.
I want the scores from df for any values that are in df2 and make a new dataframe called df3.

Desired result:

df3 <- data.frame("Location" = c( "chr7:23454", "chr9:243256", "chr8:4352", "chr2:6795452", "chr11:8754","chr18:23452", "chr3:76345"),
                  "Score" = c("", "", "", "", "", "deleterious(0.2)", "deleterious(0.1)"))

I am just not sure what the best/fastest method to do this. I am not quite sure where to begin. I feel like you can do this with dplyr but I have never done this before

Good question! What you are looking for is a conditional update of a dataframe on matching values with another dataframe. Check out [this question](https://stackoverflow.com/questions/6112260/conditional-merge-replacement-in-r). — Roman, Jul 16 '19 at 19:15
I tried a couple of their answers and I am still having some trouble. I think I am struggling because my data has empty cells. The question asked there has two full columns with no empty cells. — neuron, Jul 16 '19 at 19:22
It doesn't seem like the answers there work for this question :/ — neuron, Jul 16 '19 at 19:40
It is probably a bad idea to code "emtpy cells" using `""` instead of `NA`. — Axeman, Jul 16 '19 at 19:42
@Brian because `""` may be a valid, not missing string, because it may look identical on output to other, different strings (such as `" "` or `" "`) and because there are dedicated functions to handle NA's (like `is.na`, `na.rm` parameter for many functions, `drop_na` etc.). — January, Jul 17 '19 at 06:50

eastclintw00d · Accepted Answer · 2019-07-17T06:38:12.450

1

Using a left_join() from dplyr:

library(dplyr)
df3 <- df2 %>% 
  dplyr::select(-Score) %>% 
  left_join(df, by = "Location")

edited Jul 17 '19 at 06:38

answered Jul 16 '19 at 19:07

eastclintw00d

2,250
1
9
18

1

This will not work if `df2` has score for the new data that is not present in `df`. – Roman Jul 16 '19 at 19:14
I am not quite sure this is working for. I get this error when I try that 'Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘select’ for signature ‘"data.frame" ' – neuron Jul 16 '19 at 19:14
1

Did you load `dplyr`? – eastclintw00d Jul 16 '19 at 19:18
@eastclintw00d yes – neuron Jul 16 '19 at 19:22
Use `dplyr::select()` to specifically load dply'r select function. You probably have a package loaded that masks it with its own select function. – Roman Jul 17 '19 at 00:39

score 0 · Answer 2 · answered Jul 16 '19 at 20:01

I was able to sort of force this.

I started with this

df3 <- anti_join(df2, df, by = "Location")
df3 <- rbind(df3, df)

but that gave me some extra data that I didn't want/need so I filtered back with df2

df3 <- df3 %>%
  filter(Location %in% df2$Location)

This isn't the prettiest method so if anyone else has a cleaner method, please feel free to answer!

Roman · Answer 3 · 2019-07-17T09:54:10.767

df

  Location Score
1        A     1
2        B     2
3        C    NA
4        D    NA
5        E     5
6        F    NA
7        G     7

df2

  Location Score
1        E    NA
2        F    NA
3        G    NA
4        H    NA
5        I    NA
6        J    NA
7        K    11

df3

  Location Score
1        H    NA
2        I    NA
3        J    NA
4        K    11
5        E     5
6        F    NA
7        G     7

Code

library(dplyr)
df3 <- df2 %>%
    anti_join(df, by = "Location") %>%
    bind_rows(inner_join(df, df2 %>% select(1), by = "Location"))

Data

df <- data.frame("Location" = LETTERS[1:7],
                 "Score" = c(1, 2, NA, NA, 5, NA, 7),
                 stringsAsFactors = FALSE)

df2 <- data.frame("Location" = LETTERS[5:11],
                  "Score" = c(rep(NA, 6), 11),
                  stringsAsFactors = FALSE)

How to move data from one dataframe to a second dataframe

3 Answers3

Code

Data