0

Data:

dt <- data.table(uid = c("abc001", "abc002", "abc003"..."abc100"), 
                 coords=("36.8 x 108", "55.5 x -4.6", "37.2 x -84.0"..."55.5 x -4.6"))

Note: Any set of coordinates may be associated with more than one user id. In this example, the dt would have 100 unique uids, but < 100 unique set of coordinates.

Goal: aggregate the data by coords so that each unique set of coordinates is associated with a set of userids. That is:

coords                  uid
36.8 x 108              abc001
55.5 x -4.6             abc002, abc100
37.2 x -84.0            abc003

How would I accomplish this task? The functions associated with the aggregation techniques I have found deal with performing mathematical operations on the data. For example, if the uid information was actually recorded temperatures for the coordinates, I could readily use lapply in data.table's j variable and then group by coords. In fact, I did use lapply along with list() in the j term, which produced a data.table that seemed to meet my needs. Unfortunately, multiple uids are held as lists, a type that throws an error when used with write.csv or the ff package.

I even explored using some variation of ' unlist '. However, maintaining the association between the uids and the coords becomes an issue at that point.

Finally, I produced the desired result in Calc after writing out the ungrouped data to a csv file by using a simple sort plus a couple of if statements. It only works because the dataset is small, though.

Thoughts?

asachet
  • 6,620
  • 2
  • 30
  • 74
EunosNB
  • 41
  • 6
  • 3
    `dt[, .(uid = toString(uid)), by = coords]`?? – David Arenburg Oct 11 '15 at 21:09
  • Provide reproducible data which means one can copy it from your post and paste it into R. – G. Grothendieck Oct 11 '15 at 23:40
  • @DavidArenburg Thank you, your suggestion works quite well for me. A fast, friendly answer - awesome. – EunosNB Oct 12 '15 at 02:08
  • @G.Grothendieck Mea culpa. I should've added a ' c ' between ' coords= ' and the ' ( '. And I shouldn't have attempted to provide a sense of scale using the ellipses between the third and 100th values in the two vectors...uh, lists...oh, whatever - the columns in the data.table. Despite my ham-handedness, Arenburg provided an elegant solution, so it worked out in the end. I'll try to post to a higher standard the next time I have a question. Cheers! – EunosNB Oct 12 '15 at 02:21
  • @DavidArenburg Indeed, the question you cited addresses the issue(s) I encountered. If I had found it (I did search), I would have used one of the answers. Of note, the answer uses DT[, list(id=paste(id, collapse=",")), by=brand] – EunosNB Oct 12 '15 at 11:48
  • @DavidArenburg ...continuing my previous comment... v. your answer of dt[, .(uid = toString(uid)), by = coords] If the methods have a significant time difference, it may be worthwhile to keep this post along with a note about the efficiency of one solution v. the other. – EunosNB Oct 12 '15 at 11:57

0 Answers0