0

I have a dataset where I have self-report measure for students for different subscales (factor which contains some levels). I want to add new factor levels for each participant.

# A tibble: 12 x 3
   first_name subscales          value
   <chr>      <fct>              <int>
 1 P1         Emotion Regulation     5
 2 P1         Empathy                7
 3 P1         Family Support        10
 4 P1         Gratitude             12
 5 P1         Optimism              12
 6 P1         Peer Support           9
 7 P1         Persistence            5
 8 P1         School Support         8
 9 P1         Self-Awareness         7
10 P1         Self-Control           6
11 P1         Self-Efficacy          8
12 P1         Zest                  12

#dput 

structure(list(first_name = c("P1", "P1", "P1", "P1", "P1", "P1", 
"P1", "P1", "P1", "P1", "P1", "P1"), subscales = structure(1:12, .Label = c("Emotion Regulation", 
"Empathy", "Family Support", "Gratitude", "Optimism", "Peer Support", 
"Persistence", "School Support", "Self-Awareness", "Self-Control", 
"Self-Efficacy", "Zest"), class = "factor"), value = c(5L, 7L, 
10L, 12L, 12L, 9L, 5L, 8L, 7L, 6L, 8L, 12L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -12L))

Let's say I want to add new factor levels for each participant such that:

Engaged Living = Optimism + Zest + Gratitude

Emotional Competence = Emotion Regulation + Self-Control + Empathy,

My current workflow is to convert the df from long to wide and then back to long (pivot_longer and pivot_wider). This get's the job done but I'm wondering if there is another workflow that avoids doing this (i.e., keep the df in long format). I'm looking for a tidyverse/dplyr workflow.

user438383
  • 5,716
  • 8
  • 28
  • 43
Pss
  • 553
  • 4
  • 12
  • 2
    Several options here: [Cleaning up factor levels (collapsing multiple levels/labels)](https://stackoverflow.com/questions/19410108/cleaning-up-factor-levels-collapsing-multiple-levels-labels), including tidyverse (forcats). – Henrik Jan 17 '23 at 20:39
  • I did take a look at forcats but I could not find any function that suits my use-case. E.g., `fct_expand` does allow me to add factors but not by specifying other factor levels. I looked at UWE's answer and it does not suit my use-case. – Pss Jan 17 '23 at 20:49

1 Answers1

1

It seems you want to add rows, not just factor levels. One way would be to create the new summary rows and then bind that back to the original data. For example

library(dplyr)
dd %>% 
  mutate(subscales = case_when(
    subscales %in% c("Optimism", "Zest", "Gratitude") ~ "Engaged Living",
    subscales %in% c("Emotion Regulation", "Self-Control", "Empathy") ~ "Emotional Competence"
  )) %>% 
  group_by(first_name, subscales) %>% 
  filter(!is.na(subscales)) %>% 
  summarize(value=sum(value)) %>% 
  bind_rows(dd)

which gives

# A tibble: 14 × 3
# Groups:   first_name [1]
   first_name subscales            value
   <chr>      <chr>                <int>
 1 P1         Emotional Competence    18
 2 P1         Engaged Living          36
 3 P1         Emotion Regulation       5
 4 P1         Empathy                  7
 5 P1         Family Support          10
 6 P1         Gratitude               12
 7 P1         Optimism                12
 8 P1         Peer Support             9
 9 P1         Persistence              5
10 P1         School Support           8
11 P1         Self-Awareness           7
12 P1         Self-Control             6
13 P1         Self-Efficacy            8
14 P1         Zest                    12
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I don't see the new subscales (engaged living, emotional competence) in output :( – Pss Jan 17 '23 at 22:42
  • I included the output I get when I run the code. It seems to work for me with the sample data. The number of rows goes from 12 to 14 – MrFlick Jan 18 '23 at 14:25