Want to transform the data from long format to wide format. Overall for ColA need to have only one row. There would be duplicates in ColB to ColE in those cases I am trying to aggregate it by count. ColF is aggregated by sum().
s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)
head(grp_by)
Not sure how to transform the rest of the columns
Update: Based on the suggestion made use of reshape2 package
library(reshape2)
s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)
s2 <- dcast(s, ColA ~ ColB)
s3 <- dcast(s, ColA ~ ColC)
s4 <- dcast(s, ColA ~ ColD)
s5 <- dcast(s, ColA ~ ColE)
print(s2)
print(s3)
print(s4)
print(s5)
print(grp_by)
This is the output of those print statements.
How could I merge all those into one data frame? My actual dataset is 1 million records - Is this code optimal enough to run on top of it or is there better way of writing it. Thanks for all your help.