How to convert labelled columns from wide to long format and keep the labels? [R]

Question

I am working with a dataset with haven labelled columns (i.e., variables with labels). The dataset is in wide format, as follows:

df <- data.frame("Trust" = c(1, 2, 3, 2, 3),          
         "Evaluation" = c(2, 3, 1, 1, 3))

Trust and Evaluation are survey questions. Each row corresponds to an individual and they give an answer in a scale from 1 (No trust/bad evaluation) to 3 (Absolute trust/good evaluation).Trust and Evaluation variables are labelled (i.e., the header in the RStudio viewer appears with a description ("how much do you trust X", "how do you evaluate Y"). I am not able to add the labels on the example and I cannot share the original dataset, but I think is clear what I mean.

What I want is to convert this into long format, as:

df <- data.frame("Question" = c("Trust", "Trust", "Trust", "Trust", "Trust", "Evaluation", "Evaluation", "Evaluation", "Evaluation"),          
             "Value" = c(1, 2, 3, 2, 3, 2, 3, 1, 1, 3)),
"Label" = c("LabelTrust", "LabelTrust", "LabelTrust", "LabelTrust", "LabelTrust", "LabelEvaluation", "LabelEvaluation", "LabelEvaluation", "LabelEvaluation")

Ideally, then, I want to create a column for the name of the variable, another for the label and another for the value this variable takes. I apologize because I think the code is not fully reproducible, but I think it is clear what I mean.

Is there any way to make this? I am trying to do this with pivot_longer from tidyr but running into problems because of the labelled nature of the columns. I guess I could make it if I first convert the variables into factors, but then I would lose the labels name and this is not what I want to achieve.

If the solution could come from the tidyverse, that'd be awesome. Thanks everybody so much!

It would be helpful if you provide an example with labelled data itself so that we can test our answers on that. — Ronak Shah, Jan 05 '21 at 10:39

score 2 · Answer 1 · answered Jan 05 '21 at 10:32

Your question might be related to Reshaping data.frame from wide to long format.

How about the following

df <- data.frame("Trust" = c(1, 2, 3, 2, 3),          
                 "Evaluation" = c(2, 3, 1, 1, 3))

df.long <- reshape(df,
                   direction = "long",
                   varying = list(names(df)),
                   v.names = "Value",
                   idvar = c("Trust", "Evaluation"),
                   timevar = "Question")

df.long$Question <- ifelse(df.long$Question == 1, "Trust", "Evaluation")
df.long$Label    <- ifelse(df.long$Question == 1, "LabelTrust", "LabelEvaluation")

df.long

gives you

        Question Value           Label
1.2.1      Trust     1      LabelTrust
2.3.1      Trust     2      LabelTrust
3.1.1      Trust     3      LabelTrust
2.1.1      Trust     2      LabelTrust
3.3.1      Trust     3      LabelTrust
1.2.2 Evaluation     2 LabelEvaluation
2.3.2 Evaluation     3 LabelEvaluation
3.1.2 Evaluation     1 LabelEvaluation
2.1.2 Evaluation     1 LabelEvaluation
3.3.2 Evaluation     3 LabelEvaluation

HTH

yh6 · Accepted Answer · 2021-01-05T14:29:44.660

If I understand correctly, your data frame is created using labelled package, such as

df <- data.frame("Trust" = c(1, 2, 3, 2, 3), "Evaluation" = c(2, 3, 1, 1, 3))
var_label(df$Trust) <- "how much do you trust X"
var_label(df$Evaluation) <- "how do you evaluate Y"

According to the package documentation, look_for function gives you the following output.

# pos   variable   label                   col_type values      
# <chr> <chr>      <chr>                   <chr>    <chr>       
# 1     Trust      how much do you trust X dbl      range: 1 - 3
# 2     Evaluation how do you evaluate Y   dbl      range: 1 - 3

Based on this, your desired output could be obtained as follows:

df %>% pivot_longer(cols=everything(), names_to="Question", values_to="Value") %>%
  left_join(df %>% look_for() %>% select(Question=variable, Label=label), by="Question")

I hope this would help!

Update

In order to include the labels of values, the following code will work (although a bit complicated)

left_join(
  df %>% pivot_longer(cols=everything(), names_to="Question", values_to="Value") %>% nest(df=-Question),
  df %>% look_for() %>% lookfor_to_long_format() %>% select(Question=variable, Label=label, ValueLabel=value_labels) %>%
    mutate(ValueLabel=str_split(ValueLabel, "] "),
           Value=map_int(ValueLabel, ~ .[[1]] %>% str_replace("\\[", "") %>% as.integer()),
           ValueLabel=map_chr(ValueLabel, ~ .[[2]])) %>% nest(label=-Question),
  by="Question"
) %>% mutate(df=map2(df, label, ~ left_join(.x, .y, by="Value"))) %>% select(-label) %>% unnest(cols=df)

(Output)

## A tibble: 10 x 4
#   Question   Value Label                   ValueLabel
#   <chr>      <dbl> <chr>                   <chr>
# 1 Trust          1 how much do you trust X No trust
# 2 Trust          2 how much do you trust X Somewhat trust
# 3 Trust          3 how much do you trust X Absolute trust
# 4 Trust          2 how much do you trust X Somewhat trust
# 5 Trust          3 how much do you trust X Absolute trust
# 6 Evaluation     2 how do you evaluate Y   Neither good nor bad
# 7 Evaluation     3 how do you evaluate Y   Good
# 8 Evaluation     1 how do you evaluate Y   Bad
# 9 Evaluation     1 how do you evaluate Y   Bad
#10 Evaluation     3 how do you evaluate Y   Good

This answer is really great! I have one moure trouble though. For simplicity, I didn't say that in the original df the answer values are also labelled (so 1 is numerical ("double") and labelled (e.g., "Bad"); 3 is numerical and labelled (3 - "Good"). I think this is why I get an error when using pivot_longer, concretely: Error: Can't convert from > to > due to loss of precision. * Locations: 7, 42, 43, 77, 83, 89, 100... Values are labelled in `` but not in ``. Any idea how to solve this? Thanks a lot! — Carlos González Poses, Jan 05 '21 at 11:14
I actually found a solution which worked for me (it eliminates duplicated rows but that is not a problem for my real issue). For anyone interested, it involves using the function look_for_to_long_format(), as look_for %>% look_for_to_long_format() — Carlos González Poses, Jan 05 '21 at 13:03
I updated my answer based on your comments. But I did not get an error by `pivot_longer` even when the values are labelled and I'm not sure what caused your error. — yh6, Jan 05 '21 at 14:25

How to convert labelled columns from wide to long format and keep the labels? [R]

2 Answers2