I'm cleaning a data set and after cleaning duplicates, I would like to merge the rows that share the same observation in a specific column (e.g. ID column).
I am looking to merge/aggregate so that I only have one row per chosen observation (i.e. here: one row per ID) left. If possible, the aggregate row would sum-up all observations but the chosen one to merge (ID).
This would be hypothetical settings:
set.seed(18)
dat <- data.frame(ID=c(1,2,1,2,2,3),value=c(5,5,7,8,3,2),location=c("NY","LA","NY","LA","LA","LA"))
dat
And I would like to know how to obtain
set.seed(9)
dat1 <- data.frame(id=c(1,2,3),value=c(5+7,5+8+3,2),location=c("NY","LA","LA"))
dat1
Which aggregate with respect to ID, sum the observations "value" and pick the corresponding location.
Also, I would like to know if it's possible to group the dataframe with respect to location, such as to obtain:
set.seed(6)
dat2 <- data.frame(location=c("NY","LA"),value=c(5+7,5+8+3+2),meanvalue=c(mean(5+7),mean(5+8+3+2)))
dat2
I did not put ID in this table because in this case, it does not matter: it can be summed or deleted, it's not going to be taken into account for any further computation. I know that my output for meanvalue is wrong: I am looking to get the mean of all rows sharing the same location (i.e. mean for LA and NY). I would appreciate if you also can correct me on this one.
Thank you for your help!