1

I have been following a tutorial on data.tables here Let's say that I have the following table (I have changed the original table to fit my question)

##    gear cyl gearsL
## 1:    4   6  4,3,5
## 2:    4   6  4,3,5
## 3:    4   4  4,3,5
## 4:    3   6  5,6,7
## 5:    3   8  5,6,7
## 6:    3   6  5,6,7

I now want to create a new column which will "ungroup" the gearsL column, as follows:

##    gear cyl gearsL  gearA
## 1:    4   6  4,3,5  4
## 2:    4   6  4,3,5  3
## 3:    4   4  4,3,5  5
## 4:    3   6  5,6,7  5
## 5:    3   8  5,6,7  6
## 6:    3   6  5,6,7  7

I can use the following snippet of code to extract a static element, e.g. element at index 2.

dt[,gearL1:=lapply(gearsL, function(x) x[2])]
dt[,gearS1:=sapply(gearsL, function(x) x[2])]

This will result in the following table:

##    gear cyl gearsL  gearL1 gearS1
## 1:    4   6  4,3,5  3      3
## 2:    4   6  4,3,5  3      3
## 3:    4   4  4,3,5  3      3
## 4:    3   6  5,6,7  6      6
## 5:    3   8  5,6,7  6      6
## 6:    3   6  5,6,7  6      6

However, I want a "dynamic" index. First, I created a new field, called IDX, which acts as a row-number with groups.

dt[,IDX:=1:.N,by='gear']

which will result in the following table:

##    gear cyl gearsL  gearL1 gearS1  IDX
## 1:    4   6  4,3,5  3      3        1
## 2:    4   6  4,3,5  3      3        2
## 3:    4   4  4,3,5  3      3        3
## 4:    3   6  5,6,7  6      6        1
## 5:    3   8  5,6,7  6      6        2
## 6:    3   6  5,6,7  6      6        3

Using the newly created IDX column, I would like to access the elements of each list as follows:

 dt[,gearA:=sapply(gearsL, function(x) x[IDX])]
 dt[,gearA:=lapply(gearsL, function(x) x[IDX])]

However, the above snippet doesn't work as expected. How can I access the elements of a list based on the value of another column?

Ahmadov
  • 1,567
  • 5
  • 31
  • 48
  • 1
    Try `dt[,gearA := Map("[", gearsL, IDX)]` – talat Jun 14 '17 at 15:35
  • Fyi, "list columns", as they're usually called, are very slow and there are alternatives. Also, row number within groups is available as `dt[, idx := rowid(gear)]` – Frank Jun 14 '17 at 16:19
  • I just don't get why the question is being downvoted... I have provided all necessary steps to reproduce, took care of proper formatting and also mentioned my attempts. SO community disappointms me sometimes. – Ahmadov Jun 14 '17 at 19:25
  • 1
    First, no, you did not provide what is necessary. There is no way to know if your column is "4,3,5" or list(4,3,5). You may think it's obvious which it is, but posters routinely are mistaken about how their data is actually structured. And you did not provide it in a reproducible way. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 Second, I can downvote because I find it "not useful", in the sense that I think data should not be structured or accessed this way. – Frank Jun 14 '17 at 22:44
  • I have provided a link to the problem background and from the first code snippet I provided it is clearly visible that we are talking about the list of lists. If I write every single piece from the link that would make the question very long and cluttered. However, the main point of the question is to simply ask how to access data.table list column elements dynamically. Besides, I am not saying the question is perfect. Many, many questions are not perfect in SO. It doesn't mean they all need to get downvoted and deleted, right? – Ahmadov Jun 15 '17 at 06:43
  • And finally, it took less than an hour to get a perfect solution. Which, again, means that the question was clear and understandable. – Ahmadov Jun 15 '17 at 06:46
  • Different people have different reasons for downvoting and thresholds in terms of usefulness and quality. I explained mine and would retract it if I found your counterarguments persuasive, but it's not a big deal. I don't think you should turn my vote (or any others you get) into some big generalization about SO or the world here. If you don't get an explanation of your next downvote, this may help and/or entertain instead: https://meta.stackexchange.com/a/215397 – Frank Jun 15 '17 at 18:58

1 Answers1

2

dt[, gearA := mapply('[[', gearsL, IDX, SIMPLIFY = TRUE)]

This runs along both gearsL and IDX, giving them as arguments to the [[ function. I.e., gears[[i]][[IDX[[i]]]].

Nathan Werth
  • 5,093
  • 18
  • 25
  • great, thanks, it worked. The question is though, why can't I use IDX inside sapply? – Ahmadov Jun 14 '17 at 15:44
  • 1
    `sapply(gearsL, function(x) x[IDX])` is identical to `c(gearsL[[1]][IDX], gears[[2]][IDX], ...)`. If you need to vectorize more than one argument in a function, use `mapply`. – Nathan Werth Jun 14 '17 at 16:03
  • or without mapply `dt[, gearsA := gearsL %>% \`[[\`(1) %>%\`[\`(IDX), by=1:nrow(dt)]` – Ufos Mar 27 '19 at 17:20