0

enter image description here

Want to transform the data from long format to wide format. Overall for ColA need to have only one row. There would be duplicates in ColB to ColE in those cases I am trying to aggregate it by count. ColF is aggregated by sum().

s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)
head(grp_by)

Not sure how to transform the rest of the columns

Update: Based on the suggestion made use of reshape2 package

library(reshape2)

s <- read_csv("sample.csv")
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)

s2 <- dcast(s, ColA ~ ColB)
s3 <- dcast(s, ColA ~ ColC)
s4 <- dcast(s, ColA ~ ColD)
s5 <- dcast(s, ColA ~ ColE)

print(s2)
print(s3)
print(s4)
print(s5)
print(grp_by)

This is the output of those print statements.

enter image description here

How could I merge all those into one data frame? My actual dataset is 1 million records - Is this code optimal enough to run on top of it or is there better way of writing it. Thanks for all your help.

prasanth
  • 483
  • 1
  • 4
  • 11
  • 3
    Take a look here http://stackoverflow.com/questions/5890584/reshape-data-from-long-to-wide-format-r – David Arenburg Dec 30 '15 at 11:10
  • @DavidArenburg Thanks for your suggestion. Updated the question after making use of reshape2. Can you please check the question once again and guide me appropriately. Thanks. – prasanth Dec 30 '15 at 14:33
  • 1
    See here on how to provide a reproducible example and desired output http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – David Arenburg Dec 30 '15 at 18:27

1 Answers1

0

This is the sample code I used to transform and merge the data. There could be some better way but this is the best I could come up with.

# Include needed libraries
library(reshape2)

# Load the sample data
s <- read_csv("sample.csv")

# Aggregate ColF by SUM for each ColA
s_1 <- subset(s, select=c("ColA", "ColF"))
grp_by <- aggregate(. ~ ColA , data = s_1, FUN = sum)

# Long to Wide format
s2 <- dcast(s, ColA ~ ColB)
s3 <- dcast(s, ColA ~ ColC)
s4 <- dcast(s, ColA ~ ColD)
s5 <- dcast(s, ColA ~ ColE)

# But this is the crude way of removing NA columns which I used!
# Rename the NA column into something so that it can be removed by assigning NULL!!
colnames(s2)[7] <- "RemoveMe"
colnames(s3)[5] <- "RemoveMe"
colnames(s4)[5] <- "RemoveMe"
colnames(s5)[4] <- "RemoveMe"

s2$RemoveMe <- NULL
s3$RemoveMe <- NULL
s4$RemoveMe <- NULL
s5$RemoveMe <- NULL

# Merge all pieces to form the final transformed data
s2 <- merge(x = s2, y = s3, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = s4, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = s5, by="ColA", all = TRUE)
s2 <- merge(x = s2, y = grp_by, by="ColA", all = TRUE)

# Removing the row with user_id = NA!!
s2 <- s2[-c(4), ]

# Final transformed data
print(s2)

Used these as reference:

  1. dcast - How to reshape data from long to wide format?
  2. merge - How to join (merge) data frames (inner, outer, left, right)?
Community
  • 1
  • 1
prasanth
  • 483
  • 1
  • 4
  • 11