0

I need to sort the elements of a 'list' column of data.table in alphabetical order and coerce them to a character vector in another intermediate column of R's data.table. Currently, not able to spot the error for the 1st row.

Following code used to generate the original data.table:

my_dt <- data.table(A = rep(1:5, 3), B = rnorm(15, mean=10, sd=2), C = list(c("mango", "pear", "apple")))

Here col. C is a list with repeating elements of "mango", "pear" and "apple" across all the 15 rows of my_dt

Example: my_dt$C[1] yields:

[[1]]
[1] "mango" "pear" "apple"

Next, I want to sort the individual elements for each row and store them in col. D of my_dt. I am using the following code to sort and populate task:

for (lmn in 1:nrow(my_dt)){
  word1 <- sapply(my_dt$C[lmn], '[[', 1)
  word2 <- sapply(my_dt$C[lmn], '[[', 2)
  word3 <- sapply(my_dt$C[lmn], '[[', 3)
  my_dt$D[lmn] <- list(sort(c(word1, word2, word3)))
}

However, on printing the output i.e. my_dt, I see the following:

    A         B                C                D
 1: 1  7.781597 mango,pear,apple            apple
 2: 2 10.267061 mango,pear,apple apple,mango,pear
 3: 3 10.670469 mango,pear,apple apple,mango,pear
 4: 4 10.252527 mango,pear,apple apple,mango,pear
 5: 5 10.605396 mango,pear,apple apple,mango,pear
 6: 1 13.054545 mango,pear,apple apple,mango,pear
 7: 2 12.401846 mango,pear,apple apple,mango,pear
 8: 3 11.094550 mango,pear,apple apple,mango,pear
 9: 4 10.220841 mango,pear,apple apple,mango,pear
10: 5 11.452469 mango,pear,apple apple,mango,pear
11: 1 11.827297 mango,pear,apple apple,mango,pear
12: 2  6.918918 mango,pear,apple apple,mango,pear
13: 3  9.757636 mango,pear,apple apple,mango,pear
14: 4 13.432524 mango,pear,apple apple,mango,pear
15: 5 10.648629 mango,pear,apple apple,mango,pear

I am not sure why 1st entry under col. D shows only apple as compared to the rest of the rows under the same column which have all 3 sorted elements i.e. apple, mango and pear. Ideally, I would like to have these entries consistent across col. D and not partially populated as seen for Row # 1.

Thank you in advance.

ds_newbie
  • 79
  • 8

1 Answers1

2

You can simplifiy your code and use unlist before you sort the list elements:

my_dt[, D := toString(sort(unlist(C))), by = 1:nrow(my_dt)][]
#    A         B                C                  D
# 1: 1  9.245525 mango,pear,apple apple, mango, pear
# 2: 2 10.195239 mango,pear,apple apple, mango, pear
# 3: 3 13.277489 mango,pear,apple apple, mango, pear
# 4: 4  8.248815 mango,pear,apple apple, mango, pear
# 5: 5 10.243520 mango,pear,apple apple, mango, pear
# 6: 1 12.724261 mango,pear,apple apple, mango, pear
# 7: 2  9.530758 mango,pear,apple apple, mango, pear
# 8: 3  7.893234 mango,pear,apple apple, mango, pear
# 9: 4  8.260433 mango,pear,apple apple, mango, pear
#10: 5  9.219746 mango,pear,apple apple, mango, pear
#11: 1  8.305300 mango,pear,apple apple, mango, pear
#12: 2  9.478721 mango,pear,apple apple, mango, pear
#13: 3  9.171161 mango,pear,apple apple, mango, pear
#14: 4  9.633898 mango,pear,apple apple, mango, pear
#15: 5 10.814112 mango,pear,apple apple, mango, pear

If column D should be a list column, do

my_dt[, D := list(list(sort(unlist(C)))), by = 1:nrow(my_dt)]
my_dt

See Arun's answer from the post: Using lists inside data.table columns

markus
  • 25,843
  • 5
  • 39
  • 58
  • Thanks Markus, I think this works. Could you pls elaborate on the toString function, also am I correct to note that using 1:nrow(my_dt) in the "by" argument is an elegant way to avoid loops in general? Also would appreciate if someone is able to tell me where the error happens in my approach. Thanks once again. – ds_newbie Apr 30 '19 at 11:25
  • @ds_newbie I was about to delete the answer because `toString` resturns - surprise - a string, not a list as in your code. The `by = 1:nrow(my_dt)` code just indicates that we are working rowwise here. Does this help? – markus Apr 30 '19 at 11:28
  • @ds_newbie type `toString.default` to see the source code... basically it is a wrapper for `paste(x, collapse = ", ")` – s_baldur Apr 30 '19 at 11:28
  • Yes, Markus, toString() results in a character vector, which I am fine to deal with. The next step was in fact to convert the "sorted" entries column to a character vector. However, I am not sure on the possible error made by me in the code shared in the question. – ds_newbie Apr 30 '19 at 11:45