Spread categorical variables iinto new non-Boolean columns

Question

Example data follows:

 id   sy    OC
13693 2017  1
13752 2017  5
13693 2017  4
44555 2018  3

What am I doing incorrectly for the following code?

SORs.pivot(index='id',columns="sy",values='OC').add_prefix('sy').reset_index()

I have never seen "pivot"ing used within R before, but I am eager to learn, once I get past this hurdle.

I wish for the final output to be something like the following:

 id   sy2017  sy2018
13693 1       na
13752 5       na
13693 4       na
44555 na      3

I adapted it from this Stack Overflow page.

I am also looking to get the summation of the values within the cells for the repeating ids (13693).

Update

First, please let me apologize for mixing R and Python. That was just silly on my part.

I am still having problems with the data even though I used some of the solutions:

Now this yields a df with over 200,000 records - but the logic works, and I am ready to spread the columns out.

I tried two different ways but neither worked.

First I tried:

reshape(dat2, idvar="id", timevar="sy", direction="wide").

All this yielded was a df with two columns. The first was the subjectkey and the next said DistinctOrderCound.2017:2018 - and the latter column is just a column of NAs.

Then I tried:

spread(dat2, key = sy, value=value).

This yielded a Error saying duplicate values for rows and a sample listing of the duplicates.

I think the reshape should work and work nicely. I do not think there are any issues with the summation any more as I took care of that with a pre-query.

You tried to use python code in R (the question you linked is python, not R). — Jan Boyer, Aug 16 '18 at 22:45
*"I have never seen "pivot"ing used within R before"* Pivoting is a *very* common task in R (you will see at least a hand-full of questions about this every day here on SO); in the R domain it's more commonly known as "spreading" data, or "casting/reshaping data from long to wide". Some more popular methods to do this are `tidyr::spread`, `reshape::reshape`, `data.table::dcast`. I have never heard of `SORs.pivot` nor do I know which R package this function comes from. I recommend sticking to the more popular packages/methods. — Maurits Evers, Aug 16 '18 at 22:45
[continued] Take a look at [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format). — Maurits Evers, Aug 16 '18 at 22:48
`library(reshape2); dcast(DF, OC + id ~ paste0("sy", sy))[-1]` — markus, Aug 16 '18 at 23:01

score 0 · Answer 1 · answered Aug 16 '18 at 22:32

0

The R package tidyr uses the spread function for this task. In your case, you could try tidyr::spread(data, sy, OC) which should accomplish your goals. For more on tidyr::spread and tidyr::gather, see this blog post

answered Aug 16 '18 at 22:32

Patton

1
1

Thanks a bunch. But - and this is the geek in me - may it be accomplished with the pivoting? – Zach Aug 16 '18 at 22:37
@Zach `spread`ing and pivoting are the same thing. See my comment to your OP above. – Maurits Evers Aug 16 '18 at 22:49

score 0 · Answer 2 · answered Aug 17 '18 at 15:42

0

dcast() solves everything. Kind of weird how simple it is.

Thank you to everyone!

answered Aug 17 '18 at 15:42

Zach

37
8

Would you edit this to show how you used `dcast`, so this answer is as useful as possible for future readers? Thank you. – halfer Aug 23 '18 at 09:35

Spread categorical variables iinto new non-Boolean columns

Update

2 Answers2