dplyr: Subtracting between two data tbls

Question

I have a background data file and an experimental data file. What I need is to calculate the colMeans from the background file, and subtract from the experimental data the corresponding average background reading.

This is easy in base r:

dataField1 <- "someField"
dataField2 <- "someField2"
ctrlMeans <- colMeans (read.csv ("ctrl.csv"))
exprData <- read.csv ("expr.csv")
exprData [, c(dataField1, dataField2)] <- exprData [, c(dataField1, dataField2)] - ctrlMeans [c(dataField1, dataField2)]

But I found the last step difficult to implement in dplyr. The best I can get is the following:

ctrlMeansTbl <- read_csv ('ctrl.csv') %>% summarize_all (mean)
exprDataTbl<- read_csv('expr.csv') %>% mutate (
  dataField1 := !! quo (dataField1) - select (ctrlMeansTbl, !!quo (dataField1)),
  dataField2 := !! quo (dataField2) - select (ctrlMeansTbl, !!quo (dataField2))
)

But this throws an error:

Error in rep_len(as.vector(e1), prod(dim(e2))) : 
  attempt to replicate non-vector

Just to be clear, the formats of ctrlMeansTbl and exprDataTbl (before the mutate) are as follows:

> head (ctrlMeansTbl)
# A tibble: 1 x 4
  `someField1` `someField2` `someField3`    `someField4`
       <dbl>    <dbl>            <dbl>   <dbl>
1   489.7096 74.24759         547.9139 16.0828
> head (donorSingle)
# A tibble: 6 x 4
  `someField1` `someField2` `someField3`    `someField4`
       <dbl>    <dbl>            <dbl>    <dbl>
1  132123.44  1560.74        166069.17 0.619378
2   11125.93   156.95         14045.20 0.620412
3   14590.51   243.82         18132.47 0.621446
4   76014.17   839.50         95961.42 0.623514
5   91344.17  1054.85        115226.85 0.627650
6    7651.86   146.73          9528.69 0.631786

Do anyone have any idea on this? Thanks!

score 1 · Answer 1 · answered Jul 11 '17 at 16:48

1

I think your problem is that you are using select to obtain the values to subtract. However this function returns a dataframe and not a vector. I would try to adapt your code in this way:

ctrlMeansTbl <- read_csv ('ctrl.csv') %>% summarize_all (mean)
  exprDataTbl<- read_csv('expr.csv') %>% mutate (
  dataField1 := !! quo (dataField1) - ctrlMeansTbl$dataField1,
  dataField2 := !! quo (dataField2) - ctrlMeansTbl$dataField2
)

answered Jul 11 '17 at 16:48

Luís Telles

694
3
13

Doesn't work. `dataField1` and `dataField2` are variables, so I get a `Unknown or uninitialised column: 'dataField1'.` error. After I fixed that by using square brackets (`as.numeric(ctrlMeansTbl[1, dataField1])`), I still get a `non-numeric argument to binary operator` error. Is it now a quosure problem? – John M. Jul 11 '17 at 17:13
Could you please show the output from `as.numeric(ctrlMeansTbl[1,dataField1])` and also the output of `class` on it? – Luís Telles Jul 11 '17 at 17:16
What you'd expect from a true numeric... > as.numeric(ctrlMeansTbl[1, dataField1]) [1] 489.7096 > class(as.numeric(ctrlMeansTbl[1, dataField1])) [1] "numeric" – John M. Jul 11 '17 at 17:25
This is rather strange... You're using the `data.table` package, right? Have you tried using normal `dplyr` syntax? I mean, like `dataField1 = dataField1 - ctrlMeansTbl[1,"dataField1"]`. I think the main problem here is syntax. – Luís Telles Jul 11 '17 at 17:34
No, I don't use `data.table`. I only use base R and dplyr. – John M. Jul 11 '17 at 17:41
2

OK, eventually got the main cause of the error. In a [different thread](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures) it was mentioned strings should be converted to quosures by `sym` rather than `quo`. – John M. Jul 11 '17 at 21:07

score 1 · Accepted Answer · answered Jul 11 '17 at 16:48

1

No reproducible example, but you can directly subtract means:

mtcars %>% mutate_all(funs(. - mean(.)))

A more general purrr solution would be:

map2_df(mtcars, colMeans(mtcars), `-`)

That being said, the base way seems perfectly fine to me.

answered Jul 11 '17 at 16:48

Axeman

32,068
8
81
94

The second option works in conjunction with `select`, but the first one doesn't-- I got a `Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'tbl_vars' applied to an object of class "fun_list".` error. – John M. Jul 11 '17 at 17:28
Instead of `select` for the first you can switch out `mutate_all` with `mutate_at`. Easy enough. – Axeman Jul 11 '17 at 21:12

score 0 · Answer 3 · answered Jul 11 '17 at 16:58

Define columns you want to mutate as vector (thesecols). Make and select relevant columns of ctrlMeansTbl

library(dplyr)
thesecols <- c("mpg","cyl")
ctrlMeansTbl <- read_csv('ctrl.csv') %>%
                   summarize_all(mean) %>% 
                   select(thesecols)

Make iterator of ctrlMeansTbl by column

library(iterators)
bycol <- iter(ctrlMeansTbl,by="col")

Use mutate_at and nextElem:

exprDataTbl<- read_csv('expr.csv') %>% 
                  mutate_at(vars(thesecols), funs(. - nextElem(bycol)))

dplyr: Subtracting between two data tbls

3 Answers3