Perform String Function(s) Using l- or s- or apply AND Export to CSV with R

Question

Data:

dt <- data.table(uid = c("abc001", "abc002", "abc003"..."abc100"), 
                 coords=("36.8 x 108", "55.5 x -4.6", "37.2 x -84.0"..."55.5 x -4.6"))

Note: Any set of coordinates may be associated with more than one user id. In this example, the dt would have 100 unique uids, but < 100 unique set of coordinates.

Goal: aggregate the data by coords so that each unique set of coordinates is associated with a set of userids. That is:

coords                  uid
36.8 x 108              abc001
55.5 x -4.6             abc002, abc100
37.2 x -84.0            abc003

How would I accomplish this task? The functions associated with the aggregation techniques I have found deal with performing mathematical operations on the data. For example, if the uid information was actually recorded temperatures for the coordinates, I could readily use lapply in data.table's j variable and then group by coords. In fact, I did use lapply along with list() in the j term, which produced a data.table that seemed to meet my needs. Unfortunately, multiple uids are held as lists, a type that throws an error when used with write.csv or the ff package.

I even explored using some variation of ' unlist '. However, maintaining the association between the uids and the coords becomes an issue at that point.

Finally, I produced the desired result in Calc after writing out the ungrouped data to a csv file by using a simple sort plus a couple of if statements. It only works because the dataset is small, though.

Thoughts?

Provide reproducible data which means one can copy it from your post and paste it into R. — G. Grothendieck, Oct 11 '15 at 23:40
@DavidArenburg Thank you, your suggestion works quite well for me. A fast, friendly answer - awesome. — EunosNB, Oct 12 '15 at 02:08
@G.Grothendieck Mea culpa. I should've added a ' c ' between ' coords= ' and the ' ( '. And I shouldn't have attempted to provide a sense of scale using the ellipses between the third and 100th values in the two vectors...uh, lists...oh, whatever - the columns in the data.table. Despite my ham-handedness, Arenburg provided an elegant solution, so it worked out in the end. I'll try to post to a higher standard the next time I have a question. Cheers! — EunosNB, Oct 12 '15 at 02:21
@DavidArenburg Indeed, the question you cited addresses the issue(s) I encountered. If I had found it (I did search), I would have used one of the answers. Of note, the answer uses DT[, list(id=paste(id, collapse=",")), by=brand] — EunosNB, Oct 12 '15 at 11:48
@DavidArenburg ...continuing my previous comment... v. your answer of dt[, .(uid = toString(uid)), by = coords] If the methods have a significant time difference, it may be worthwhile to keep this post along with a note about the efficiency of one solution v. the other. — EunosNB, Oct 12 '15 at 11:57

Perform String Function(s) Using l- or s- or apply AND Export to CSV with R

0 Answers0