I have a relatively simple problem in R that I can't seem to figure out. I have two dataframes that have the exact same dimensions:
#dataframe 1
df1 = data.frame("contrast1" = c('2.3', '6.5', '0.6', '-0.8', '2.3', '2.4', '-7.1'),
"contrast2" = c('1.0','0.9','0.8','2.3','4.3','8.7','0.4'),
"contrast3" = c('-0.2','-0.1','-1.2','-2.3','-0.3','-0.4','-0.1'))
row.names(df1) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')
#dataframe 2
df2 = data.frame("contrast1" = c('1', '1', '0', '0', '1', '1', '1'),
"contrast2" = c('1','0','0','1','1','1','0'),
"contrast3" = c('0','0','1','1','0','0','0'))
row.names(df2) = c('gene1','gene2','gene3','gene4','gene5','gene6','gene7')
The dataframes look like this:
>df1
contrast1 contrast2 contrast3
gene1 2.3 1.0 -0.2
gene2 6.5 0.9 -0.1
gene3 0.6 0.8 -1.2
gene4 -0.8 2.3 -2.3
gene5 2.3 4.3 -0.3
gene6 2.4 8.7 -0.4
gene7 -7.1 0.4 -0.1
>df2
contrast1 contrast2 contrast3
gene1 1 1 0
gene2 1 0 0
gene3 0 0 1
gene4 0 1 1
gene5 1 1 0
gene6 1 1 0
gene7 1 0 0
Now, I want to replace certain values in df1, based on the value in df2. Specifically, whenever a value is zero in df2, I want to corresponding field in every column of df1 to be replaced by 'NA'.
The resulting dataframe should thus look like this:
>df_output
contrast1 contrast2 contrast3
gene1 2.3 1.0 NA
gene2 6.5 NA NA
gene3 NA NA -1.2
gene4 NA 2.3 -2.3
gene5 2.3 4.3 NA
gene6 2.4 8.7 NA
gene7 -7.1 NA NA
There really are a lot of similar questions around - here, here and here for example - but none of them seem to do what I am looking for. I tried several things that did not do the trick. Below code, using dplyr, is the closest I got - but it is throwing error messages to me, and does not yet take into account that I actually want to replace values by 'NA' rather than zero.
library(dplyr)
df2 %>%
left_join(df1, by = rownames(df1)) %>%
mutate(Count = ifelse(is.zero(Count.x), Count.y, Count.x)) %>%
select(-c(Count.x, Count.y))