-1

I want to reshape a large data set i have from a long to a wide format. Currently my data set is formed as follows:

df <- structure(list(Politician = c("1", "2", "3", "k", "1", "2", "3", 
"k"), country = c("uk", "nl", "ro", "z", "uk", "nl", "ro", "z"
), variables = c(NA, NA, NA, NA, NA, NA, NA, NA), voteid = c(12, 
12, 12, 12, 13, 13, 13, 13), votedecision = c(1, 9, 9, 1, 3, 
2, 0, 9)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame"))

Now i want to reshape this votematrix as follows:

# A tibble: 3 x 8
  Politician counrty variables vote12 vote13 vote14 vote15 ...  
       <int> <chr>   <lgl>      <dbl>  <dbl>  <dbl>  <dbl> <chr>
1          1 uk      NA             1      3      1      9 ...  
2          2 nl      NA             9      2      2      0 ...  
3          3 ro      NA             9      0      1      2 ...  

The dataset contains 8 variables and over 9 million observations. I'm pretty new to Rstudio, so thus far i've just tried a bunch of codes that i found on the internet. For example:

ep.new = cast(ep, mepid~voteid, value = "votedecision")

when I run that order it takes a long time and then i get the a warning: Aggregation requires fun.aggregate: length used as default

Does anyone have any tips or suggestions how to solve my problem(s)?

*there are several more variables containing information about the specific politicians.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Douwe
  • 1
  • Please provide a reproducible example for us to test or reproduce the error or expected results. See: https://stackoverflow.com/help/mcve – cropgen Apr 14 '19 at 17:00
  • Hey nsinghs, as i said im new to Rstudio and for that matter, this forum as well. I don't really know how to do that. However, i have booked some results now using reshape2 with the following command: ep.new = dcast(ep, mepid ~ voteid, value.var = "votedecision"). this now reshaped the data set in the correct wide form, but now im missing the other variables. – Douwe Apr 14 '19 at 17:20
  • that is where you click on the link I provided in my comment and learn about how to post a good question and and example – cropgen Apr 14 '19 at 17:30
  • @nsinghs, thanks again, but i am not allowed to share the original data and i don't know how to create similar data for an example. the data i use has 9456984 obs. of 8 variables. i use the following command: ep.newnew = dcast(ep, mepid + mep_name + mep_nationalparty ~ voteid, value.var = “votedecision”) and it kind of worked as the data is now reshaped to 843 obs. of 7167 variables, but it still provided the following message: Aggregation function missing: defaulting to length. What does the message mean? – Douwe Apr 14 '19 at 18:55
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. You don't have to include your *actual* data, but it's difficult to help on these things without a workable sample and without folks needing to type in numbers from a picture – camille Apr 15 '19 at 02:56

1 Answers1

0

You can use the tidyr package, specifically spread, to reshape tidy data:

library(tidyr)

spread(df, key = voteid, value = votedecision, sep = "")

# A tibble: 4 x 5
  Politician country variables voteid12 voteid13
  <chr>      <chr>   <lgl>        <dbl>    <dbl>
1 1          uk      NA               1        3
2 2          nl      NA               9        2
3 3          ro      NA               9        0
4 k          z       NA               1        9
Paul
  • 2,877
  • 1
  • 12
  • 28