Change values for a random selection of a data.table subset

Question

Basically an extension to this question, because I noticed, that if you are subsetting for a second time it's not possible to change a value of a column.

random.length  <-  sample(x = 15:30, size = 1)
dt <- data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE)) 
set.seed(1)
dt[sample(.N,3), score :=9999]
set.seed(1)
dt[sample(.N,3),]

This works as expected and changes the score to 9999 for the three randomly selected cities. Although if you subset in a first step and then do the sampling and try to assign a new score value it's not possible.

set.seed(1)
dt[city == "New York",][sample(.N,1), score := 55555]
set.seed(1)
dt[city == "New York",][sample(.N,1)]

What I would like to achieve is that I can change a value of some column, which is part of certain subset and gets randomly selected from this subset.

"for the three randomly selected cities" -- you selected rows, not cities. By the way, your `set.seed` comes too late (after `sample` is used). — Frank, Aug 31 '16 at 13:56
The set.seed() is just there to make it easier to see, that in the first case the score changes and one can directly check the different score and in the second case nothing changes. — hannes101, Aug 31 '16 at 14:01
Could do `dt[sample(dt[, .I[city == "New York"]], 3), score := 55555]` maybe. Or if you want to override 3 random obs in *each* city, you could do `dt[dt[, .I[sample(.N, 3)], by = city]$V1, score := 55555]` — David Arenburg, Aug 31 '16 at 14:03

score 5 · Answer 1 · answered Aug 31 '16 at 14:01

5

dt[city == "New York"] returns an entirely new object, on which you're updating by reference. But this does not affect dt. i.e.,

dt[expr, col := val] != dt[expr][, col := val]

The first expression updates dt where expr evaluates to TRUE. The second one updates the subset returned from dt[expr]. Unless you assign the result back to a variable, there's no way to get back the result.

answered Aug 31 '16 at 14:01

Arun

116,683
26
284
387

Ok, that's what I suspected, thanks for the quick answer. – hannes101 Aug 31 '16 at 14:03
One thing I find a bit odd about the assignment with := is that it does not really give you a meaningful error or error message at all. Especially in this case, where there actually just is no point in assigning anything it is a bit odd, that at first it appears to have succeeded, or? Would it be possible to somehow show an error in those cases? – hannes101 Sep 02 '16 at 14:12

score 5 · Accepted Answer · answered Aug 31 '16 at 14:10

You can also sample the index ( which can be calculated using which function ) besides all the suggestions above:

dt[sample(which(city == "New York"), 1), score:=555L]
dt
#           city score
#  1:   Tel Aviv     8
#  2:  Amsterdam     3
#  3:  Cape Town    10
#  4:   New York     1
#  5:  Cape Town    10
#  6: Pittsburgh     2
#  7: Pittsburgh     8
#  8:  Amsterdam    10
#  9:  Amsterdam     8
# 10:  Amsterdam     4
# 11:   Tel Aviv     7
# 12:  Amsterdam     2
# 13: Pittsburgh     1
# 14:  Amsterdam     3
# 15: Pittsburgh     2
# 16:   New York     7
# 17:   Tel Aviv    10
# 18:   New York    10
# 19:  Cape Town     1
# 20:  Amsterdam     7
# 21:  Amsterdam     3
# 22:   New York   555
# 23:  Cape Town     6
# 24:   New York     1
# 25:   Tel Aviv    10
#           city score

Change values for a random selection of a data.table subset

2 Answers2

Linked