-1

I am working with a dataset with haven labelled columns (i.e., variables with labels). The dataset is in wide format, as follows:

df <- data.frame("Trust" = c(1, 2, 3, 2, 3),          
         "Evaluation" = c(2, 3, 1, 1, 3))

Trust and Evaluation are survey questions. Each row corresponds to an individual and they give an answer in a scale from 1 (No trust/bad evaluation) to 3 (Absolute trust/good evaluation).Trust and Evaluation variables are labelled (i.e., the header in the RStudio viewer appears with a description ("how much do you trust X", "how do you evaluate Y"). I am not able to add the labels on the example and I cannot share the original dataset, but I think is clear what I mean.

What I want is to convert this into long format, as:

df <- data.frame("Question" = c("Trust", "Trust", "Trust", "Trust", "Trust", "Evaluation", "Evaluation", "Evaluation", "Evaluation"),          
             "Value" = c(1, 2, 3, 2, 3, 2, 3, 1, 1, 3)),
"Label" = c("LabelTrust", "LabelTrust", "LabelTrust", "LabelTrust", "LabelTrust", "LabelEvaluation", "LabelEvaluation", "LabelEvaluation", "LabelEvaluation")

Ideally, then, I want to create a column for the name of the variable, another for the label and another for the value this variable takes. I apologize because I think the code is not fully reproducible, but I think it is clear what I mean.

Is there any way to make this? I am trying to do this with pivot_longer from tidyr but running into problems because of the labelled nature of the columns. I guess I could make it if I first convert the variables into factors, but then I would lose the labels name and this is not what I want to achieve.

If the solution could come from the tidyverse, that'd be awesome. Thanks everybody so much!

Dhrumil shah
  • 611
  • 4
  • 23
  • It would be helpful if you provide an example with labelled data itself so that we can test our answers on that. – Ronak Shah Jan 05 '21 at 10:39

2 Answers2

2

Your question might be related to Reshaping data.frame from wide to long format.

How about the following

df <- data.frame("Trust" = c(1, 2, 3, 2, 3),          
                 "Evaluation" = c(2, 3, 1, 1, 3))

df.long <- reshape(df,
                   direction = "long",
                   varying = list(names(df)),
                   v.names = "Value",
                   idvar = c("Trust", "Evaluation"),
                   timevar = "Question")

df.long$Question <- ifelse(df.long$Question == 1, "Trust", "Evaluation")
df.long$Label    <- ifelse(df.long$Question == 1, "LabelTrust", "LabelEvaluation")

df.long

gives you

        Question Value           Label
1.2.1      Trust     1      LabelTrust
2.3.1      Trust     2      LabelTrust
3.1.1      Trust     3      LabelTrust
2.1.1      Trust     2      LabelTrust
3.3.1      Trust     3      LabelTrust
1.2.2 Evaluation     2 LabelEvaluation
2.3.2 Evaluation     3 LabelEvaluation
3.1.2 Evaluation     1 LabelEvaluation
2.1.2 Evaluation     1 LabelEvaluation
3.3.2 Evaluation     3 LabelEvaluation

HTH

MacOS
  • 1,149
  • 1
  • 7
  • 14
1

If I understand correctly, your data frame is created using labelled package, such as

df <- data.frame("Trust" = c(1, 2, 3, 2, 3), "Evaluation" = c(2, 3, 1, 1, 3))
var_label(df$Trust) <- "how much do you trust X"
var_label(df$Evaluation) <- "how do you evaluate Y"

According to the package documentation, look_for function gives you the following output.

# pos   variable   label                   col_type values      
# <chr> <chr>      <chr>                   <chr>    <chr>       
# 1     Trust      how much do you trust X dbl      range: 1 - 3
# 2     Evaluation how do you evaluate Y   dbl      range: 1 - 3

Based on this, your desired output could be obtained as follows:

df %>% pivot_longer(cols=everything(), names_to="Question", values_to="Value") %>%
  left_join(df %>% look_for() %>% select(Question=variable, Label=label), by="Question")

I hope this would help!

Update

In order to include the labels of values, the following code will work (although a bit complicated)

left_join(
  df %>% pivot_longer(cols=everything(), names_to="Question", values_to="Value") %>% nest(df=-Question),
  df %>% look_for() %>% lookfor_to_long_format() %>% select(Question=variable, Label=label, ValueLabel=value_labels) %>%
    mutate(ValueLabel=str_split(ValueLabel, "] "),
           Value=map_int(ValueLabel, ~ .[[1]] %>% str_replace("\\[", "") %>% as.integer()),
           ValueLabel=map_chr(ValueLabel, ~ .[[2]])) %>% nest(label=-Question),
  by="Question"
) %>% mutate(df=map2(df, label, ~ left_join(.x, .y, by="Value"))) %>% select(-label) %>% unnest(cols=df)

(Output)

## A tibble: 10 x 4
#   Question   Value Label                   ValueLabel
#   <chr>      <dbl> <chr>                   <chr>
# 1 Trust          1 how much do you trust X No trust
# 2 Trust          2 how much do you trust X Somewhat trust
# 3 Trust          3 how much do you trust X Absolute trust
# 4 Trust          2 how much do you trust X Somewhat trust
# 5 Trust          3 how much do you trust X Absolute trust
# 6 Evaluation     2 how do you evaluate Y   Neither good nor bad
# 7 Evaluation     3 how do you evaluate Y   Good
# 8 Evaluation     1 how do you evaluate Y   Bad
# 9 Evaluation     1 how do you evaluate Y   Bad
#10 Evaluation     3 how do you evaluate Y   Good
yh6
  • 379
  • 2
  • 13
  • This answer is really great! I have one moure trouble though. For simplicity, I didn't say that in the original df the answer values are also labelled (so 1 is numerical ("double") and labelled (e.g., "Bad"); 3 is numerical and labelled (3 - "Good"). I think this is why I get an error when using pivot_longer, concretely: Error: Can't convert from > to > due to loss of precision. * Locations: 7, 42, 43, 77, 83, 89, 100... Values are labelled in `` but not in ``. Any idea how to solve this? Thanks a lot! – Carlos González Poses Jan 05 '21 at 11:14
  • 2
    I actually found a solution which worked for me (it eliminates duplicated rows but that is not a problem for my real issue). For anyone interested, it involves using the function look_for_to_long_format(), as look_for %>% look_for_to_long_format() – Carlos González Poses Jan 05 '21 at 13:03
  • 1
    I updated my answer based on your comments. But I did not get an error by `pivot_longer` even when the values are labelled and I'm not sure what caused your error. – yh6 Jan 05 '21 at 14:25