Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

Question

I'm new to R and have scoured the site to find a solution - I've found lots of similar, but slightly different questions. I'm stumped.

I have a dataset in this structure:

  SURVEY_ID    CHILD_NAME    CHILD_AGE
  Survey1      Billy             4
  Survey2      Claude            12
  Survey2      Maude             6
  Survey2      Constance         3
  Survey3      George            22
  Survey4      Marjoram          14
  Survey4      LeBron            37

I'm trying to pivot the data wider so that there's a) only one unique SURVEY_ID per row, and, critically, b) a new column for second, third, etc. children for surveys with more than one child.

So the result would look like:

    SURVEY_ID    CHILD_NAME1    CHILD_NAME2    CHILD_NAME3    CHILD_AGE1  CHILD_AGE2  CHILD_AGE3
    Survey1      Billy                                        4
    Survey2      Claude         Maude          Constance      12          6           3
    Survey3      George                                       22
    Survey4      Marjoram       Lebron                        14          37

The actual data has thousands of surveys and the number of "child names" and "child ages" could be as high as 10. It's the issue of creating the new columns not from existing value names and only where there are multiple children that has me perplexed.

Tidy specific answer at the duplicate - https://stackoverflow.com/a/62893920/496803 — thelatemail, May 26 '22 at 03:22

Onyambu · Accepted Answer · 2022-05-26T03:22:25.730

Using base R:

reshape(transform(df, time = ave(SURVEY_ID, SURVEY_ID, FUN=seq)), 
       v.names = c('CHILD_NAME', 'CHILD_AGE'), 
       direction = 'wide', idvar = 'SURVEY_ID', sep = '_')

  SURVEY_ID CHILD_NAME_1 CHILD_AGE_1 CHILD_NAME_2 CHILD_AGE_2 CHILD_NAME_3 CHILD_AGE_3
1   Survey1        Billy           4         <NA>          NA         <NA>          NA
2   Survey2       Claude          12        Maude           6    Constance           3
5   Survey3       George          22         <NA>          NA         <NA>          NA
6   Survey4     Marjoram          14       LeBron          37         <NA>          NA

using tidyverse:

library(tidyverse)
df %>%
  group_by(SURVEY_ID) %>%
  mutate(name = row_number()) %>%
  pivot_wider(SURVEY_ID, values_from = c(CHILD_NAME, CHILD_AGE))

# A tibble: 4 x 7
# Groups:   SURVEY_ID [4]
  SURVEY_ID CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3 CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3
  <chr>     <chr>        <chr>        <chr>              <int>       <int>       <int>
1 Survey1   Billy        NA           NA                     4          NA          NA
2 Survey2   Claude       Maude        Constance             12           6           3
3 Survey3   George       NA           NA                    22          NA          NA
4 Survey4   Marjoram     LeBron       NA                    14          37          NA

using data.table

library(data.table)
dcast(setDT(df), SURVEY_ID~rowid(SURVEY_ID), value.var = c('CHILD_AGE', 'CHILD_NAME'))
   SURVEY_ID CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3 CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3
1:   Survey1           4          NA          NA        Billy         <NA>         <NA>
2:   Survey2          12           6           3       Claude        Maude    Constance
3:   Survey3          22          NA          NA       George         <NA>         <NA>
4:   Survey4          14          37          NA     Marjoram       LeBron         <NA>

Holy cow that's it! Thank you so much, you have no idea how many hours I've spent on this. Will send you a coffee — User10110101, May 26 '22 at 03:24
@User10110101 consider upvoting/accepting. This question has already been flagged as duplicate, so this is the only answer you will get — Onyambu, May 26 '22 at 03:26
Ok, I tried to upvote but I dont have 15 reputation yet, though I did mark it as solved. I actually tried out the duplicate question's script before posting, but it was different enough (to a beginner) that I couldnt get it to work with my data. Thanks again for the help. — User10110101, May 26 '22 at 03:33
@onaymbu I have another quick question and would love your help - do you have an active email address I could send it to and I can "buy you a couple cups of coffee"? — User10110101, Jun 16 '22 at 18:06
@User10110101 my email should be on my profile (onyambu@g.ucla.edu) Sorry I just realized the one on the profile had a double o at the begining — Onyambu, Jun 16 '22 at 18:18

Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

1 Answers1