I'm looking for help on how to take a long dataset and make it wide, but with some summarizing as well.
My dataset generically looks like this (I'm new to SO, so sorry if my formatting isn't up to par!):
structure(list(StateSenatorialDistrict = c(41L, 14L, 30L, 38L,
43L, 37L, 20L, 45L, 37L, 44L), CandidateOfficeCode = structure(c(2L,
5L, 2L, 5L, 4L, 3L, 1L, 4L, 1L, 1L), .Label = c("ATT", "AUD",
"STH", "TRE", "USP"), class = "factor"), CandidateLastName = structure(c(4L,
2L, 1L, 3L, 9L, 5L, 7L, 8L, 7L, 6L), .Label = c("BROWN", "CASTLE",
"CLINTON", "DEPASQUALE", "MILLER", "RAFFERTY", "SHAPIRO", "TORSELLA ",
"VOIT"), class = "factor"), CandidateParty = structure(c(2L,
1L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 3L), .Label = c("CON", "DEM",
"REP"), class = "factor"), VoteTotal = c(256L, 3L, 202L, 188L,
18L, 402L, 251L, 383L, 156L, 761L)), .Names = c("StateSenatorialDistrict",
"CandidateOfficeCode", "CandidateLastName", "CandidateParty",
"VoteTotal"), row.names = c(30901L, 115192L, 41264L, 1389L, 21982L,
29827L, 192288L, 20019L, 12803L, 60823L), class = "data.frame")
It is a dataset of precinct level vote data in Pennsylvania.
StateSenatorialDistrict CandidateOfficeCode CandidateLastName CandidateParty VoteTotal
41 AUD DEPASQUALE DEM 256
14 USP CASTLE CON 3
30 AUD BROWN REP 202
38 USP CLINTON DEM 188
43 TRE VOIT REP 18
37 STH MILLER DEM 402
20 ATT SHAPIRO DEM 251
45 TRE TORSELLA DEM 383
37 ATT SHAPIRO DEM 156
44 ATT RAFFERTY REP 761
There are many more columns, but for the purposes here, these are fine.
I'd like to take this data and summarize it such that I get one line per Senatorial District with select other data on each line. The idea outcome would look like this (the data here is made up--it isn't based on the above):
StateSenatorialDistrict SenateRepLastName SenateDemLastName SenateRepVoteTotal SenateDemVoteTotal ClintonVotes TrumpVotes
41 BOZO SMITH 250 300 1000 2000
42 JOHNSON CARTER 2012 237 1350 1000
53 ARCHIBALD BISHOP 350 500 5000 3000
On any given row, you know that a candidate is a senator b/c their CandidateOfficeCode is STS; you know that they are a Dem or Rep by their Party which is REP or DEM.
I know that I can aggregate data and then try to dcast it to a wide format, but that leaves me with a very wide table, with each candidate name being a column (and no information about office or party):
senateDistricts2016 <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateOfficeCode + CandidateFirstName + CandidateLastName + CandidateParty, data=votes2016[votes2016$CandidateOfficeCode %in% c("USP", "STS"),], FUN="sum")
wideSenate <- dcast(senateDistricts2016, StateSenatorialDistrict ~ CandidateLastName)
Is there an easy way to do this? If not, thoughts about a hard way to do this?
Thanks in advance. And please let me know if my question doesn't make sense--I'm happy to edit.
EDIT:
I don't believe this is a dupe. I'm not just trying to make my data wide, but summarize it when it becomes wide, so it is not unweildy. I ended up doing a series of aggregates and then merges:
senateDs <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateName, data=votes2016[votes2016$CandidateOfficeCode=="STS" & votes2016$CandidateParty=="DEM",], FUN="sum")
senateRs <- aggregate(VoteTotal ~ StateSenatorialDistrict + CandidateName, data=votes2016[votes2016$CandidateOfficeCode=="STS" & votes2016$CandidateParty=="REP",], FUN="sum")
senateTotalVotes <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="STS",], FUN="sum")
senateVotes <- merge(senateDs,senateRs, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes, senateTotalVotes, by="StateSenatorialDistrict", all=TRUE)
# now aggregate the presidential votes for Rs, Ds, and Total votes and combine
senatePresD <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP" & votes2016$CandidateParty=="DEM",], FUN="sum")
senatePresR <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP" & votes2016$CandidateParty=="REP",], FUN="sum")
senatePresTotalVotes <- aggregate(VoteTotal ~ StateSenatorialDistrict, data=votes2016[votes2016$CandidateOfficeCode=="USP",], FUN="sum")
senateVotes <- merge(senateVotes,senatePresD, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes,senatePresR, by="StateSenatorialDistrict", all=TRUE)
senateVotes <- merge(senateVotes, senatePresTotalVotes, by="StateSenatorialDistrict", all=TRUE)
setnames(senateVotes, c("StateSenatorialDistrict", "DCandidate","DVotes","RCandidate","RVotes", "TotalSenatorVotes", "PresDVotes", "PresRVotes","TotalPresVotes"))