1

I have a large data.table with hundreds of columns and thousands of rows. Most of the columns hold numeric values that are ratios like X/Y or Y/Z etc.

I need to flip some of these ratios so that they are transformed from Y/Z -> Z/Y. The only indicator I have of these columns is the column name which includes the substring "x/y"or "y/z".

I can get the columns that match "y/z" using grepl but I am not sure how I can use that array of logical values for apply/lapply etc. I realize that I can extract the columns (by logical indexing or .SDcols) and transform them, but I don't want to discard/ignore the remaining columns.

Lastly, I have tried to something like this

flipcols <- grepl("Y/Z", names(sites))
sites.new <- sites[, , lapply(.SD, function(x) 1/x), .SDcols = flipcols]

but there is no difference between the sites and sites.new, the columns that should have been transformed are not transformed and the summed difference between corresponding columns is 0.

Suggestions?

EDIT: Following @akrun's I tried the := operator, but it leads to other issues as follow:

# I think this fails because flipcols is a logical vector and not a list of names or indices
> sites.new <- sites[, (flipcols) := lapply(.SD, function(x) 1/x), .SDcols = flipcols]
Error in `[.data.table`(sites, , `:=`((flipcols), lapply(.SD, function(x) 1/x)),  : 
  LHS of := isn't column names ('character') or positions ('integer' or 'numeric')


# and this seems to fail because .SDcols seems to lock the data in read-only mode
> sites.new <- sites[, which(flipcols) := lapply(.SD, function(x) 1/x), .SDcols = flipcols]
Error in assign(ii, SDenv$.SDall[[ii]], SDenv) : 
  cannot change value of locked binding for '.SD'

EDIT2: Here's a minimal example, the goal is to transform the columns which match "Y/Z" pattern (second and fourth in our minimal example here), while keeping the other columns unchanged and part of the result.

> dt <- data.table(matrix(rnorm(25), 5,5))
> names(dt) <- c("X/Y_1", "Y/Z_1", "X/Y_2", "Y/Z_2", "X/Y_3")
> dt
        X/Y_1       Y/Z_1       X/Y_2      Y/Z_2      X/Y_3
1:  1.5972490 -0.01763484  1.10745607 -0.1416583 -0.4632829
2:  0.6629621 -0.82719204 -1.68214956  0.6145526 -0.8169235
3: -0.7491393 -0.05290791  0.63935066  1.0665537 -1.9107424
4: -0.6804972 -0.40107880 -0.01030063  1.4566075 -0.6866042
5:  0.2505391 -0.29091850 -1.95926987  0.8733446  1.3909565
posdef
  • 6,498
  • 11
  • 46
  • 94
  • 1
    You had two `,,` before `lapply` it should be `sites[, lapply(.SD, ..`. The general format of data.table i.e `dt[i, j, by]` – akrun Mar 10 '17 at 14:25
  • @akrun that's correct, however by removing a comma I effectively select only the columns that I operate on. I need the untouched columns to be returned as well – posdef Mar 10 '17 at 14:39
  • 1
    In that case, use the assignment `:=` i.e. `sites[, (flipcols) := lapply(.SD, function(x) 1/x), .SDcols = flipcols]` – akrun Mar 10 '17 at 14:40
  • @akrun how do you mean? how should I be using the operator? – posdef Mar 10 '17 at 14:42
  • 1
    You are mistaken re commas there. Please put a concrete example into your post before asking too many questions, so akrun can illustrate his answer in context. – Frank Mar 10 '17 at 14:43
  • The first error tells you flipcols should be int or char. For int, use grep; for char use grep(..., value=TRUE). By the way, a concrete example means `sites` or some similar data set to illustrate the issue. – Frank Mar 10 '17 at 15:03
  • @Frank thanks for trying to clarify. I am with you on that one, technically `which(flipcols)` should work precisely the same way, right? As for the concrete example, as I said the datatable is quite large and very sparse (lots of NAs) which makes extracting a small and useful example difficult – posdef Mar 10 '17 at 15:06
  • Yeah, `which` should do it, too. Re the data size, we don't need your real use case, but rather a [mcve] (as I guess you've heard about before). Sure, constructing such an example may be hard, but you might solve your own issue along the way to creating it. – Frank Mar 10 '17 at 15:06
  • @Frank added an example case made from artificial data, i hope it's adequate in giving an idea about what I am trying to achieve – posdef Mar 10 '17 at 15:14
  • Looks like `cols = c(2,4); dt[, (cols) := lapply(.SD, "^", -1), .SDcols = cols]`, which makes it the same question as http://stackoverflow.com/q/16846380/ essentially? – Frank Mar 10 '17 at 15:17
  • Your example works, but when I run the same thing on my dataset `flipcols = which(grepl("M/L", names(sites))); sites[, (flipcols) := lapply(.SD, "^", -1), .SDcols = flipcols]` I get the same error: `Error in assign(ii, SDenv$.SDall[[ii]], SDenv) : cannot change value of locked binding for '.SD'` – posdef Mar 10 '17 at 15:24
  • Hm, yeah, that's odd. You could try updating R or data.table or running it with only data.table loaded (and not other packages) to isolate the issue. If you're interested in filing a bug report (since this *should* work), the instructions are here: https://github.com/Rdatatable/data.table/wiki/Support – Frank Mar 10 '17 at 15:58

1 Answers1

2

Following your example,

library(data.table)
dt <- data.table(matrix(rnorm(25), 5,5))
names(dt) <- c("X/Y_1", "Y/Z_1", "X/Y_2", "Y/Z_2", "X/Y_3")
dt
         X/Y_1      Y/Z_1      X/Y_2       Y/Z_2       X/Y_3
1: -0.09845804 -0.6455857  0.2259012  1.26772833  1.14451170
2: -1.22147654  1.7643609  0.5310762 -0.46869816 -0.58761886
3: -0.61469060  1.2323381 -0.4028002  0.99903384  0.01650606
4: -0.80805337  0.2733621 -0.2855663 -0.02166544  0.59398122
5: -0.68398344  0.2891335 -0.5004021  2.12063769  0.40474155

I will first match the target columns

sd.cols <- grep("Y/Z", names(dt), value = T)

Then, just changing the columns by reference, using standart data.table notation.

dt[ , (sd.cols) := lapply(.SD, function(x){x^-1}), .SDcols = sd.cols ]
         X/Y_1      Y/Z_1      X/Y_2       Y/Z_2       X/Y_3
1: -0.09845804 -1.5489811  0.2259012   0.7888125  1.14451170
2: -1.22147654  0.5667775  0.5310762  -2.1335693 -0.58761886
3: -0.61469060  0.8114656 -0.4028002   1.0009671  0.01650606
4: -0.80805337  3.6581513 -0.2855663 -46.1564513  0.59398122
5: -0.68398344  3.4586094 -0.5004021   0.4715563  0.40474155
Mario GS
  • 859
  • 8
  • 22