2

I have a relatively simple problem in R that I can't seem to figure out. I have two dataframes that have the exact same dimensions:

#dataframe 1
df1 = data.frame("contrast1" = c('2.3', '6.5', '0.6', '-0.8', '2.3', '2.4', '-7.1'), 
                 "contrast2" = c('1.0','0.9','0.8','2.3','4.3','8.7','0.4'),
                 "contrast3" = c('-0.2','-0.1','-1.2','-2.3','-0.3','-0.4','-0.1'))
row.names(df1) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')

#dataframe 2
df2 = data.frame("contrast1" = c('1', '1', '0', '0', '1', '1', '1'), 
                 "contrast2" = c('1','0','0','1','1','1','0'),
                 "contrast3" = c('0','0','1','1','0','0','0'))
row.names(df2) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')

The dataframes look like this:

>df1
      contrast1 contrast2 contrast3
gene1       2.3       1.0      -0.2
gene2       6.5       0.9      -0.1
gene3       0.6       0.8      -1.2
gene4      -0.8       2.3      -2.3
gene5       2.3       4.3      -0.3
gene6       2.4       8.7      -0.4
gene7      -7.1       0.4      -0.1

>df2
      contrast1 contrast2 contrast3
gene1         1         1         0
gene2         1         0         0
gene3         0         0         1
gene4         0         1         1
gene5         1         1         0
gene6         1         1         0
gene7         1         0         0

Now, I want to replace certain values in df1, based on the value in df2. Specifically, whenever a value is zero in df2, I want to corresponding field in every column of df1 to be replaced by 'NA'.

The resulting dataframe should thus look like this:

>df_output
      contrast1 contrast2 contrast3
gene1       2.3       1.0        NA
gene2       6.5        NA        NA
gene3        NA        NA      -1.2
gene4        NA       2.3      -2.3
gene5       2.3       4.3        NA
gene6       2.4       8.7        NA
gene7      -7.1        NA        NA

There really are a lot of similar questions around - here, here and here for example - but none of them seem to do what I am looking for. I tried several things that did not do the trick. Below code, using dplyr, is the closest I got - but it is throwing error messages to me, and does not yet take into account that I actually want to replace values by 'NA' rather than zero.

library(dplyr)    
df2 %>%
      left_join(df1, by = rownames(df1)) %>%
      mutate(Count = ifelse(is.zero(Count.x), Count.y, Count.x)) %>%
      select(-c(Count.x, Count.y))
enileve
  • 180
  • 1
  • 11

2 Answers2

1

In base R, convert the type of columns of 'df1' from character to numeric with type.convert, then do just arithmetic operation for changing the values to NA

df1 <- type.convert(df1, as.is = TRUE)
(NA^(df2 == 0)) * df1

-output

#        contrast1 contrast2 contrast3
#gene1       2.3       1.0        NA
#gene2       6.5        NA        NA
#gene3        NA        NA      -1.2
#gene4        NA       2.3      -2.3
#gene5       2.3       4.3        NA
#gene6       2.4       8.7        NA
#gene7      -7.1        NA        NA

With tidyverse, we can use map2

library(purrr)
map2_dfc(df1, df2, ~ replace(.x, .y == 0, NA))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can change each value of 0 to NA in a given column, and then iterate this to each column in a loop. Not the cleanest method, but it works:

for(i in 1:length(names(df1))){
df1[which(df2[,i]==0),i]<-NA
}

df1
      contrast1 contrast2 contrast3
gene1       2.3       1.0      <NA>
gene2       6.5      <NA>      <NA>
gene3      <NA>      <NA>      -1.2
gene4      <NA>       2.3      -2.3
gene5       2.3       4.3      <NA>
gene6       2.4       8.7      <NA>
gene7      -7.1      <NA>      <NA>
Dylan_Gomes
  • 2,066
  • 14
  • 29