0

I apologize if this is a duplicated question. I tried to find my question but I may not be using the right terminology. Feel free to change the title of this post if there is a better way to ask this question.

I have two dataframes

df <- data.frame("Location" = c("chr1:123", "chr6:2452", "chr8:4352", "chr11:8754", "chr3:76345", "chr7:23454","chr18:23452"),
"Score" = c("tolered(1)", "tolerated(2)", "", "", "deleterious(0.1)", "", "deleterious(0.2)"))

df2 <- data.frame("Location" = c( "chr7:23454", "chr9:243256", "chr8:4352", "chr2:6795452", "chr11:8754","chr18:23452", "chr3:76345"),
                 "Score" = c("", "", "", "", "", "", ""))
  • df has locations and values in the "score" column that I want to keep.
  • df2 has the data from df plus some new data.
  • I want the scores from df for any values that are in df2 and make a new dataframe called df3.

Desired result:

df3 <- data.frame("Location" = c( "chr7:23454", "chr9:243256", "chr8:4352", "chr2:6795452", "chr11:8754","chr18:23452", "chr3:76345"),
                  "Score" = c("", "", "", "", "", "deleterious(0.2)", "deleterious(0.1)"))

I am just not sure what the best/fastest method to do this. I am not quite sure where to begin. I feel like you can do this with dplyr but I have never done this before

Roman
  • 4,744
  • 2
  • 16
  • 58
neuron
  • 1,949
  • 1
  • 15
  • 30
  • 1
    Good question! What you are looking for is a conditional update of a dataframe on matching values with another dataframe. Check out [this question](https://stackoverflow.com/questions/6112260/conditional-merge-replacement-in-r). – Roman Jul 16 '19 at 19:15
  • I tried a couple of their answers and I am still having some trouble. I think I am struggling because my data has empty cells. The question asked there has two full columns with no empty cells. – neuron Jul 16 '19 at 19:22
  • It doesn't seem like the answers there work for this question :/ – neuron Jul 16 '19 at 19:40
  • 2
    It is probably a bad idea to code "emtpy cells" using `""` instead of `NA`. – Axeman Jul 16 '19 at 19:42
  • Why does it matter? I am just curious – neuron Jul 16 '19 at 19:45
  • @Brian because `""` may be a valid, not missing string, because it may look identical on output to other, different strings (such as `" "` or `" "`) and because there are dedicated functions to handle NA's (like `is.na`, `na.rm` parameter for many functions, `drop_na` etc.). – January Jul 17 '19 at 06:50

3 Answers3

1

Using a left_join() from dplyr:

library(dplyr)
df3 <- df2 %>% 
  dplyr::select(-Score) %>% 
  left_join(df, by = "Location") 
eastclintw00d
  • 2,250
  • 1
  • 9
  • 18
  • 1
    This will not work if `df2` has score for the new data that is not present in `df`. – Roman Jul 16 '19 at 19:14
  • I am not quite sure this is working for. I get this error when I try that 'Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘select’ for signature ‘"data.frame" ' – neuron Jul 16 '19 at 19:14
  • 1
    Did you load `dplyr`? – eastclintw00d Jul 16 '19 at 19:18
  • @eastclintw00d yes – neuron Jul 16 '19 at 19:22
  • Use `dplyr::select()` to specifically load dply'r select function. You probably have a package loaded that masks it with its own select function. – Roman Jul 17 '19 at 00:39
0

I was able to sort of force this.

I started with this

df3 <- anti_join(df2, df, by = "Location")
df3 <- rbind(df3, df)

but that gave me some extra data that I didn't want/need so I filtered back with df2

df3 <- df3 %>%
  filter(Location %in% df2$Location)

This isn't the prettiest method so if anyone else has a cleaner method, please feel free to answer!

neuron
  • 1,949
  • 1
  • 15
  • 30
0

df

  Location Score
1        A     1
2        B     2
3        C    NA
4        D    NA
5        E     5
6        F    NA
7        G     7
df2
  Location Score
1        E    NA
2        F    NA
3        G    NA
4        H    NA
5        I    NA
6        J    NA
7        K    11
df3
  Location Score
1        H    NA
2        I    NA
3        J    NA
4        K    11
5        E     5
6        F    NA
7        G     7

Code

library(dplyr)
df3 <- df2 %>%
    anti_join(df, by = "Location") %>%
    bind_rows(inner_join(df, df2 %>% select(1), by = "Location"))

Data

df <- data.frame("Location" = LETTERS[1:7],
                 "Score" = c(1, 2, NA, NA, 5, NA, 7),
                 stringsAsFactors = FALSE)

df2 <- data.frame("Location" = LETTERS[5:11],
                  "Score" = c(rep(NA, 6), 11),
                  stringsAsFactors = FALSE)
Roman
  • 4,744
  • 2
  • 16
  • 58