Data:
dt <- data.table(uid = c("abc001", "abc002", "abc003"..."abc100"),
coords=("36.8 x 108", "55.5 x -4.6", "37.2 x -84.0"..."55.5 x -4.6"))
Note: Any set of coordinates may be associated with more than one user id. In this example, the dt would have 100 unique uids, but < 100 unique set of coordinates.
Goal: aggregate the data by coords so that each unique set of coordinates is associated with a set of userids. That is:
coords uid
36.8 x 108 abc001
55.5 x -4.6 abc002, abc100
37.2 x -84.0 abc003
How would I accomplish this task? The functions associated with the aggregation techniques I have found deal with performing mathematical operations on the data. For example, if the uid information was actually recorded temperatures for the coordinates, I could readily use lapply in data.table's j variable and then group by coords. In fact, I did use lapply along with list() in the j term, which produced a data.table that seemed to meet my needs. Unfortunately, multiple uids are held as lists, a type that throws an error when used with write.csv or the ff package.
I even explored using some variation of ' unlist '. However, maintaining the association between the uids and the coords becomes an issue at that point.
Finally, I produced the desired result in Calc after writing out the ungrouped data to a csv file by using a simple sort plus a couple of if statements. It only works because the dataset is small, though.
Thoughts?