R data.table: Sum column with multiple values per row

Asked Mar 21 '16 at 10:21

Active Sep 26 '16 at 13:53

Viewed 47 times

I have a dataset that has columns with semi-colon separated values that represent countries like this:

row countries weights
1: 22;3 1.254
2: 5 0.54
3: 6;8;123 2.65
4: 16 0.35
5: 77;21;1 0.98
6: 89 1.74
etc.

with data.tables, I can sum per unique values like this:

dt[!is.na(countries),.(sum(weights)), by= countries]

This gives me this:

              countries V1
   1:                 2 791.243
   2:               230  10.644
   3:                50   4.517
   4:                 1 544.056
   5:        1;75;77;91   0.370

The problem is that the semi-colon separated values are not splitted to their unique values. What I want is a sum per unique value in the column so that the result contains no more semi-colon separated values.

How can I split the column up and then build the sum per unique value?

edited Mar 21 '16 at 10:55

David Arenburg

91,361
17
137
196

asked Mar 21 '16 at 10:21

Mario

2,393
2
17
37

2

You might find [this](http://stackoverflow.com/questions/15347282/split-string-column-and-insert-as-multiple-new-rows) helpful – alexis_laz Mar 21 '16 at 10:37
1

thanks to both of you! this `out <- d.dt[, list(V2 = unlist(strsplit(V2, ","))), by=V1]`did the trick – Mario Mar 21 '16 at 10:50

R data.table: Sum column with multiple values per row

0 Answers0