0

I am trying to make data wide. I know there are a couple of pages dedicated to making data wide, but I have tried all of their suggestions and it just did not work.

This is the data I have

Officer              Company

Robert Abernathy     Goldman Sachs

Robert Abernathy     Walmart

Robert Abernathy     CVS

Rex Adams            Goldman Sachs

Rex Adams            Dell

Marc Abramowitz      Samsung

and I want this data to look like

Officer             Company1       Company2    Company3

Robert Abernathy    Goldman Sachs  Walmart     CVS

Rex Adams           Goldman Sachs  Dell        NA

Marc Abramowitz     Samsung        NA          NA

I thought I would be able to use the tidyr package and I did

> library(tidyr)

> ppn_wide<-spread(data=ppn1, key=Officer, value=Company)
Error: Duplicate identifiers for rows (12, 13), (20, 21), (36, 37), (40, 41), (75, 76), (116, 117), (141, 142), (149, 150), (158, 159), (189, 190), (207, 208), (244, 245), (249, 250), (264, 265), (267, 268), (273, 274), (328, 329), (339, 340), (346, 347, 348), (366, 367), (378, 379), (397, 398), (407, 408), (417, 418), (422, 423), (425, 426), (430, 431), (436, 437, 438), (450, 451), (461, 462), (481, 482), (486, 487), (491, 492), (496, 497, 498), (504, 505), (546, 547), (553, 554), (566, 567), (577, 578), (594, 595), (632, 633)'

So, I've tried this too

> reshape(ppn1, idvar="Officer", timevar="Company", direction="wide")

But then only the column officer remains and company disappears completely.

I have also tried using reshape and reshape2 packages, they don't work.

> ppn_wide<-cast(ppn1, officer~PPN.org)
Using officer as value column.  Use the value argument to cast to override this choice
Error in `[.data.frame`(data, , variables, drop = FALSE) : 

undefined columns selected

> ppn_wide<-dcast(ppn1, officer~PPN.org)
Using officer as value column: use value.var to override.

reshape2 package creates a data frame called ppn_wide but it looks nothing like the kind of data set that I want. It uses officers' names to indicate whether they hold a position in a company. Something like this,

officer            Goldman Sachs      Walmart            Dell
Robert Abernathy   Robert Abernathy   Robert Abernathy   NA

What is going on here?

Lee12345
  • 95
  • 4
  • 1
    If you click the line above my logo above, something like "edited 1 min ago", you can see an edit description where I explain how to format code blocks. Also see https://stackoverflow.com/help/formatting – Frank Dec 12 '17 at 19:11
  • 3
    Possible duplicate of [How to reshape data from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – Jealie Dec 12 '17 at 19:11
  • Also see this question for additional pointers using `data.table`: https://stackoverflow.com/questions/18047754/reshape-long-structured-data-table-into-a-wide-structure-using-data-table-functi – Jealie Dec 12 '17 at 19:15
  • 1
    I have actually read both of them. Let me make some edits to the main post so that you can see what I've done. Sorry, should've included them in the first place but I was panicking about formatting and neglected to add them as well. And thank you so much Frank for your help!!! – Lee12345 Dec 12 '17 at 19:17
  • 1
    thanks @Frank, I put back the answer with your solution – Jealie Dec 12 '17 at 19:34

2 Answers2

2

The dcast method of data.table works fine on this example:

ppn1 = read.table(text='Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung', header=T, sep=',')

Thanks to @Frank, we have the following that works:

dcast(ppn1, Officer~rowid(Officer, prefix="Company"))

Giving:

           Officer      Company1 Company2 Company3
1  Marc Abramowitz       Samsung     <NA>     <NA>
2        Rex Adams Goldman Sachs     Dell     <NA>
3 Robert Abernathy Goldman Sachs  Walmart      CVS
Jealie
  • 6,157
  • 2
  • 33
  • 36
  • I am afraid this is not quite what I wanted. The output should have Company 1, Company 2, and so on - there are so many companies that if I do this, it just wouldn't make sense. But thank you so so much for trying!! – Lee12345 Dec 12 '17 at 19:40
  • 1
    THIS WORKS!!! Thank you ever so much, both of you! I am so so appreciative! Wish you both a great great day!! – Lee12345 Dec 12 '17 at 19:47
  • 1
    Well @Frank did all the job :) The nifty trick here was to use `rowid` to assign consecutive numbers to companies belonging to each officer. See `rowid(ppn1$Officer)`: it gives `1 2 3 1 2 1`. `dcast` then tabulates the company names as a function of the officer name + the newly generated company number, which gives exactly the result expected :) – Jealie Dec 12 '17 at 19:59
0

You could make your column headers first as a new column then reshape:

df <- readr::read_csv('Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung')
df
#> # A tibble: 6 x 2
#>            Officer       Company
#>              <chr>         <chr>
#> 1 Robert Abernathy Goldman Sachs
#> 2 Robert Abernathy       Walmart
#> 3 Robert Abernathy           CVS
#> 4        Rex Adams Goldman Sachs
#> 5        Rex Adams          Dell
#> 6  Marc Abramowitz       Samsung

# Add column headers as new column (using grouped row number)
library(dplyr)
df %>% 
  group_by(Officer) %>% 
  mutate(ColName = paste0('Company', row_number())) %>% 
  tidyr::spread(ColName, Company)
#> # A tibble: 3 x 4
#> # Groups:   Officer [3]
#>            Officer      Company1 Company2 Company3
#> *            <chr>         <chr>    <chr>    <chr>
#> 1  Marc Abramowitz       Samsung     <NA>     <NA>
#> 2        Rex Adams Goldman Sachs     Dell     <NA>
#> 3 Robert Abernathy Goldman Sachs  Walmart      CVS
rdh
  • 1,035
  • 7
  • 11
  • I don't know why but it does not seem to work for some rows. They repeat themselves and other rows do not aggregate. But thank you so much for your help!!! Super appreciate it – Lee12345 Dec 12 '17 at 19:44