Complex data reshaping into wide format where input is a mix of long and wide data

Question

I am working on fairly complex data. Here is a simplified snapshot as a data frame df.

ID      Measures              ME1   ME2 X1  X2
53-21   comm - 01 narrate      2    1   NA  NA
53-21   comm - overall         1    NA  NA  NA
53-21   comm - 10 participate  NA   NA  NA  NA
43-65   comm - 02 project      2    3   NA  NA
43-65   comm - 01 narrate      1    1   NA  NA
67-21   comm - 06 action       2    1   NA  NA
67-21   comm - 08 plan         1    1   NA  1
43-65   comm - overall         2    NA  NA  NA
53-21   comm - exhibit         1    1   NA  NA

Here:

ID = Unique user ID

Measures = Name of the item being measured or assessed for a given user

For each Measure, users can be graded on up to four different items such as ME1, ME2,X1, and X2.

I want to convert this data in a format where items are being placed in rows i.e., one ID per row and their corresponding measures in additional columns. My required reshaped data frame is something like this:

ID      comm-01-narrate-ME1 comm-01-narrate-ME2 comm-01-narrate-X1 comm-01-narrate-X2 comm-overall-ME1 comm-overall-ME2 comm-overall-X1 comm-overall-X2 comm-10-participate-ME1 comm-10-participate-ME2 comm-10-participate-X1 comm-10-participate-X2 comm-exhibit-ME1 comm-exhibit-ME2 comm-exhibit-X1 comm-exhibit-X2 comm-02-project-ME1 comm-02-project-ME2 comm-02-project-X1 comm-02-project-X2 comm-06-action-ME1 comm-06-action-ME2 comm-06-action-X1 comm-06-action-X2 comm-08-plan-ME1 comm-08-plan-ME2 comm-08-plan-X1 comm-08-plan-X2
53-21   2                   1                   NA                 NA                 1                NA               NA              NA              NA                      NA                      NA                     NA                     1                1                NA              NA              NA                  NA                  NA                 NA                 NA                 NA                 NA                NA                NA               NA               NA              NA
43-65   1                   1                   NA                 NA                 2                NA               NA              NA              NA                      NA                      NA                     NA                     NA               NA               NA              NA              2                   3                   NA                 NA                 NA                 NA                 NA                NA                NA               NA               NA              NA
67-21   NA                  NA                  NA                 NA                 NA               NA               NA              NA              NA                      NA                      NA                     NA                     NA               NA               NA              NA              NA                  NA                  NA                 NA                 2                  1                  NA                NA                1                1                NA              1

The dput() for the input file df is:

dput(df)

structure(list(ID = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 3L, 1L, 2L), 
.Label = c("43-65", "53-21", "67-21"), class = "factor"), 
    Measures = structure(c(1L, 7L, 5L, 2L, 1L, 3L, 4L, 7L, 6L), 
    .Label = c("comm - 01 narrate", "comm - 02 project", "comm - 06 action", "comm - 08 plan", "comm - 10 participate", "comm - exhibit", "comm - overall"), class = "factor"), 
    ME1 = c(2L, 1L, NA, 2L, 1L, 2L, 1L, 2L, 1L), 
    ME2 = c(1L, NA, NA, 3L, 1L, 1L, 1L, NA, 1L), 
    X1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), 
    X2 = c(NA, NA, NA, NA, NA, NA, 1L, NA, NA)), class = "data.frame", row.names = c(NA, -9L))

I am struggling in defining the problem and even starting the processing. The data can be considered long but because of multiple columns, it is wide as well.

Any advice on how to achieve this output file would be greatly appreciated.

Thank you for your time in reading this post.

EDIT 1

From a related post Convert data from long format to wide format with multiple measure columns I tried the following solution:

library(data.table)
df2 = dcast(setDT(df), ID~Measures, 
            value.var=c("ME1", "ME2", "X1", "X2"))

However, I get a warning:

Aggregate function missing, defaulting to 'length'

This means that the entries in my data are all changed to 1 or NA. I do not want this to happen.

EDIT 2

When I test the suggested solution on my original data, it fails. To better explain, I am providing a small sample that closely replicates my original data. None of the existing solutions work.

dput(df)

structure(list(
Id = c("39fca07f-d62e-494a-4a86-8dec54836c08", "39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c", "39fca8ed-f34c-b7e3-4229-111155aabe35", "39fca8e9-1e08-1809-c7a8-d2c8a4bc9b00", "39fc6ae5-0de8-4820-eede-343e738e7a4a", "39fca8e9-fbf9-a098-cf8c-322810997ce9"), DeliverId = c("39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6", "39fb74ce-d5e6-69f6-f733-ee5fbc4689e6"), 
DeliverN = c("1Assess", "1Assess", "1Assess", "1Assess", "1Assess", "1Assess"), 
AssessRId = c("39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b", "39fb74cf-5fb6-4248-6d08-0e36647e190b"), 
AssessRN = c("P1", "P2", "P3", "P4", "P5", "P6"), 
AssessTId = c("1ee2684c99fa", "fd2dbea08b43", "0e0177a33282", "091b8f805553", "6e5b9301116d", "7a307a90de19"), 
AssessTN = c("Comm - 09 Narrate", "Comm - Prog Level Judge", "Comm - O Indi Level Judge", "Comm - 02 Int Prj", "Comm - 10 Learn Comm Participate", 
"Comm - 05 Exhibit"), 
S.Time = c("21/05/2020 19:47", "23/05/2020 11:06", "23/05/2020 11:05", "23/05/2020 10:59", "11/05/2020 9:58", "23/05/2020 11:00"), 
F.Time = c("24/05/2020 11:02", "23/05/2020 11:06", "23/05/2020 11:05", 
"23/05/2020 11:00", "23/05/2020 11:04", "23/05/2020 11:03"),     CompletedIndi = c(8L, 1L, 8L, 8L, 8L, 8L), 
TotalIndi = c(8L, 1L, 8L, 8L, 8L, 8L), 
Progress = c(100L, 100L, 100L, 100L, 100L, 100L), 
Build = c("Monice Island", "Pink Lasy", "", "", "", ""), 
Advice = c("Monica", "Chandler", "", "", "", ""), 
TechUserId = c(128L, 129L, 130L, 129L, 129L, 129L), 
TechName = c("Barba", "Raymond", "Raymond", "Raymond", "Raymond","Raymond"), TechEmail = c("barber@123.com", "raymond@123.com", "raymond@123.com", "raymond@123.com", "raymond@123.com", "raymond@123.com"), 
TechLife = c("0 - 2 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years", "Over 10 years"), 
OtherLife = c("0 - 2 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years", "5 - 10 years"), 
PersonUId = c(470L, 455L, 455L, 455L, 455L, 455L), 
PersonDName = c("Tall Tiffany", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff", "Sharp Steff"), 
PersonFName = c("Tall", "Sharp", "Sharp", "Sharp", "Sharp", "Sharp"), PersonLName = c("Tiffany", "Steff", "Steff", "Steff", "Steff", "Steff"), PersonUID = c("2783-4409", "4307-4369", "4307-4369", "4307-4369", "4307-4369", "4307-4369"), 
Gender = c("Female", "Female", "Female", "Female", "Female", "Female"), PYear = c(2023L, 2024L, 2024L, 2024L, 2024L, 2024L), 
Course = c("Undergrad", "Grad", "Grad", "Grad", "Grad", "Grad"), 
Special = c("Yes", "No", "No", "No", "No", "No"), 
Q1 = c(2L, 1L, 3L, 3L, 2L, 2L), 
Q2 = c(1L, NA, 2L, 2L, 1L, 2L), 
Q3 = c(1L, NA, 3L, 3L, 2L, 2L), 
Q4 = c(1L, NA, 3L, 3L, 2L, 1L), 
Q5 = c(1L, NA, 2L, 2L, 1L, 2L), 
Q6 = c(1L, NA, 0L, 1L, 1L, 1L), 
Q7 = c(1L, NA, 2L, 1L, 2L, 2L), 
Q8 = c(2L, NA, 2L, 1L, 2L, 2L), 
Q9 = c(NA, NA, NA, NA, NA, NA), 
Q10 = c(NA, NA, NA, NA, NA, NA), 
X = c(NA, NA, NA, NA, NA, NA), 
X.1 = c(NA, NA, NA, NA, NA, NA), 
ListDetails = c("Missing", "Complete", "Complete", "Complete", "Complete", "Complete")), 
class = "data.frame", row.names = c(NA, -6L))

The desired output is as below:

Id                                   DeliverId                            DeliverN AssessRId                            AssessRN AssessTId      S-Time           F-Time             CompletedIndi   TotalIndi   Progress    Build           Advice      TechUserId  TechName    TechEmail       TechLife        OtherLife       PersonDName     PersonFName PersonLName PersonUID Gender PYear  Course      Special ListDetails PersonUId Q1_Comm - 02 Int Prj Q1_Comm - 05 Exhibit Q1_Comm - 09 Narrate Q1_Comm - 10 Learn Comm Participate Q1_Comm - O Indi Level Judge Q1_Comm - Prog Level Judge Q2_Comm - 02 Int Prj Q2_Comm - 05 Exhibit Q2_Comm - 09 Narrate Q2_Comm - 10 Learn Comm Participate Q2_Comm - O Indi Level Judge Q2_Comm - Prog Level Judge Q3_Comm - 02 Int Prj Q3_Comm - 05 Exhibit Q3_Comm - 09 Narrate Q3_Comm - 10 Learn Comm Participate Q3_Comm - O Indi Level Judge Q3_Comm - Prog Level Judge Q4_Comm - 02 Int Prj Q4_Comm - 05 Exhibit Q4_Comm - 09 Narrate Q4_Comm - 10 Learn Comm Participate Q4_Comm - O Indi Level Judge Q4_Comm - Prog Level Judge Q5_Comm - 02 Int Prj Q5_Comm - 05 Exhibit Q5_Comm - 09 Narrate Q5_Comm - 10 Learn Comm Participate Q5_Comm - O Indi Level Judge Q5_Comm - Prog Level Judge Q6_Comm - 02 Int Prj Q6_Comm - 05 Exhibit Q6_Comm - 09 Narrate Q6_Comm - 10 Learn Comm Participate Q6_Comm - O Indi Level Judge Q6_Comm - Prog Level Judge Q7_Comm - 02 Int Prj Q7_Comm - 05 Exhibit Q7_Comm - 09 Narrate Q7_Comm - 10 Learn Comm Participate Q7_Comm - O Indi Level Judge Q7_Comm - Prog Level Judge Q8_Comm - 02 Int Prj Q8_Comm - 05 Exhibit Q8_Comm - 09 Narrate Q8_Comm - 10 Learn Comm Participate Q8_Comm - O Indi Level Judge Q8_Comm - Prog Level Judge Q9_Comm - 02 Int Prj Q9_Comm - 05 Exhibit Q9_Comm - 09 Narrate Q9_Comm - 10 Learn Comm Participate Q9_Comm - O Indi Level Judge Q9_Comm - Prog Level Judge Q10_Comm - 02 Int Prj Q10_Comm - 05 Exhibit Q10_Comm - 09 Narrate Q10_Comm - 10 Learn Comm Participate Q10_Comm - O Indi Level Judge Q10_Comm - Prog Level Judge X_Comm - 02 Int Prj X_Comm - 05 Exhibit X_Comm - 09 Narrate X_Comm - 10 Learn Comm Participate X_Comm - O Indi Level Judge X_Comm - Prog Level Judge X.1_Comm - 02 Int Prj X.1_Comm - 05 Exhibit X.1_Comm - 09 Narrate X.1_Comm - 10 Learn Comm Participate X.1_Comm - O Indi Level Judge X.1_Comm - Prog Level Judge
39fca07f-d62e-494a-4a86-8dec54836c08 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess  39fb74cf-5fb6-4248-6d08-0e36647e190b P1       1ee2684c99fa   21/05/2020 19:47 24/05/2020 11:02   8               8           100         Monice Island   Monica      128         Barba       barber@123.com  0 - 2 years     0 - 2 years     Tall Tiffany    Tall        Tiffany     2783-4409 Female 2023   Undergrad   Yes     Missing     470       NA                   NA                   2                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                   NA                          NA                          NA                  NA                   1                    NA                                  NA                           NA                           NA                  NA                  1                    NA                                  NA                           NA                         NA                   NA                   1                    NA                                  NA                           NA                         NA                   NA                     1                   NA                                  NA                          NA                          NA                  NA                      2                   NA                                  NA                          NA                          NA                  NA                  NA                      NA                                  NA                          NA                          NA                  NA                      NA                  NA                                  NA                              NA                          NA                  NA                  NA                  NA                                  NA                          NA                      NA                      NA                  NA                      NA                                  NA                          NA
39fca8ee-fe3f-4c85-ab0a-acb3c2db1b9c 39fb74ce-d5e6-69f6-f733-ee5fbc4689e6 1Assess  39fb74cf-5fb6-4248-6d08-0e36647e190b P2       fd2dbea08b43   23/05/2020 11:06 23/05/2020 11:06   1               1           100         Pink Lasy       Chandler    129         Raymond     raymond@123.com Over 10 years   5 - 10 years    Sharp Steff     Sharp       Steff       4307-4369 Female 2024   Grad        No      Complete    455       3                    2                    NA                   2                                   3                            1                          2                    2                    NA                   1                                   2                            NA                         3                    2                    NA                   2                                    3                           NA                          3                   1                    NA                   2                                   3                            NA                           2                   2                   NA                   1                                   2                            NA                         1                    1                    NA                   1                                   0                            NA                         1                    2                      NA                  2                                   2                           NA                          1                   2                       NA                  2                                   2                           NA                          NA                  NA                  NA                      NA                                  NA                          NA                          NA                  NA                      NA                  NA                                  NA                              NA                          NA                  NA                  NA                  NA                                  NA                          NA                      NA                      NA                  NA                      NA                                  NA                          NA

Note: I don't think this question is a duplicate as none of the previous solutions work on my data set. I request you to please open my question again and make it visible please.

I would appreciate any help or advice on why the solutions do not work on my data set.

In the new desired output, what about the `AssessRN` values P3 to P6. Are you dropping those — akrun, Jun 06 '21 at 19:55
If I use `df %>% pivot_wider(names_from = AssessTN, values_from = Q1:X.1)` it does work without any errors or warning. The data structure involves a lot of columns, so it is not clear and also as you didn't show the P3 to P6, I have some difficulty in understanding — akrun, Jun 06 '21 at 19:58

score 1 · Answer 1 · answered Jun 04 '21 at 01:07

1

We can use pivot_wider which takes multiple values_from columns

library(dplyr)
library(tidyr)
df %>%
     pivot_wider(names_from = Measures, values_from = ME1:X2)

-output

# A tibble: 3 x 29
  ID    `ME1_comm - 01 na… `ME1_comm - overa… `ME1_comm - 10 par… `ME1_comm - 02 p… `ME1_comm - 06 a… `ME1_comm - 08 p… `ME1_comm - exhi…
  <fct>              <int>              <int>               <int>             <int>             <int>             <int>             <int>
1 53-21                  2                  1                  NA                NA                NA                NA                 1
2 43-65                  1                  2                  NA                 2                NA                NA                NA
3 67-21                 NA                 NA                  NA                NA                 2                 1                NA
# … with 21 more variables: ME2_comm - 01 narrate <int>, ME2_comm - overall <int>, ME2_comm - 10 participate <int>,
#   ME2_comm - 02 project <int>, ME2_comm - 06 action <int>, ME2_comm - 08 plan <int>, ME2_comm - exhibit <int>,
#   X1_comm - 01 narrate <int>, X1_comm - overall <int>, X1_comm - 10 participate <int>, X1_comm - 02 project <int>,
#   X1_comm - 06 action <int>, X1_comm - 08 plan <int>, X1_comm - exhibit <int>, X2_comm - 01 narrate <int>, X2_comm - overall <int>,
#   X2_comm - 10 participate <int>, X2_comm - 02 project <int>, X2_comm - 06 action <int>, X2_comm - 08 plan <int>,
#   X2_comm - exhibit <int>

answered Jun 04 '21 at 01:07

akrun

874,273
37
540
662

1

Thank you so much for a quick response. Let me try it on my original data. – Sandy Jun 04 '21 at 01:10
1

@Sandy In the example there were no duplicate rows. If there are dupes in the original data, we may need to create a sequence column by Measures – akrun Jun 04 '21 at 01:11
Could you please elaborate a little more on duplicates? Currently, it seems to be working but, to be honest, I did not consider checking for duplicates. The solution is so simple and fast, extremely helpful for me. – Sandy Jun 04 '21 at 01:15
@Sandy If there are more than one unique row, then the pivot_wider would return columns with list. To avoid that you can do `df %>% group_by(Measures) %>% mutate(rn = row_number()) %>% ungroup %>% pivot_wider(names_from = Measures, values_from = ME1:X2)` – akrun Jun 04 '21 at 01:16
1

Great, I will try that. Your answer clearly solves the question that I asked in my original post. Thank you very much! – Sandy Jun 04 '21 at 01:18
Based on your knowledge and expertise in R programming, can you please advise some useful resources / books that I could begin with? I am planning on investigating in a few R beginner level books. – Sandy Jun 04 '21 at 01:21
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/233302/discussion-between-sandy-and-akrun). – Sandy Jun 04 '21 at 01:31
1

Also, don't get discouraged when you stumble on an error. It would make you think why it is causing error and thus be able to understand the behaviors of function .. – akrun Jun 04 '21 at 01:31
I tried your suggested solution on the original data, however, the results are placed on different rows of the ID. I am providing additional sample data which shows how the solution fails. – Sandy Jun 06 '21 at 04:07

Complex data reshaping into wide format where input is a mix of long and wide data

1 Answers1

Linked