This question has been asked but I'm looking for a more complete answer/slightly modified output.
I have a dataset with Lat and Long values in separate columns and want to create a unique ID for each unique combination of Lat and Long.
I'll borrow an example dataset from an older post asking the same question but for which I need a slightly different solution (Add ID column by group).
d <- read.table(text='LAT LONG
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4170
13.5330 -15.4170
13.5330 -15.4170
13.5340 -14.9350
13.5340 -14.9350
13.5340 -15.9170
13.3670 -14.6190', header=TRUE)
The solution given was:
d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))
# LAT LONG Cluster_ID
# 1 13.533 -15.418 2
# 2 13.533 -15.418 2
# 3 13.533 -15.418 2
# 4 13.533 -15.418 2
# 5 13.533 -15.417 3
# 6 13.533 -15.417 3
# 7 13.533 -15.417 3
# 8 13.534 -14.935 4
# 9 13.534 -14.935 4
# 10 13.534 -15.917 1
# 11 13.367 -14.619 5
But how do you get the interaction
command to preserve order so that the first Cluster_ID above would be 1 (full vector for last column would be 1,1,1,1,2,2,2,3,3,4,5 instead of 2,2,2,2,3,3,4,4,1,5)? It's unclear how the new factor order (converted to numeric) is determined.
I have also been trying to find equivalent way of doing this using group_by
in dplyr
but can't figure out how to output the tibble table as a dataframe (older solutions on SO seem to use depreciated dplyr commands).
Thanks!