1

Imagine I have asked four tourists to rate their experiences in two countries, France and Spain, and to tell me the years they went there. I could put their data in a table like this:

Tourism_data_current = tibble("Tourist" = c(1:4), "France_Rating" = c(5,3,7,4), "France_Year" = c(2021, 2016, 2018, 2021), "Spain_Rating" = c(4,5,NA,NA), "Spain_Year" = c(2020,2017, NA, NA))

Returning:

A tibble: 4 x 5
  Tourist France_Rating France_Year Spain_Rating Spain_Year
    <int>         <dbl>       <dbl>        <dbl>      <dbl>
1       1             5        2021            4       2020
2       2             3        2016            5       2017
3       3             7        2018           NA         NA
4       4             4        2021           NA         NA

Now if I decide I want this data to be long rather than wide so I can evaluate the relationship between rating and year as a function of country, I do not know how to proceed.

The ideal table would look like this (I assume):

A tibble: 6 x 4
  Tourist Country Tour_Rating Tour_Year
    <dbl> <chr>         <dbl>     <dbl>
1       1 France            5      2021
2       1 Spain             4      2020
3       2 France            3      2016
4       2 Spain             5      2017
5       3 France            7      2018
6       4 France            4      2021

This target table was generated manually from the following code:

Tourism_data_ideal = tibble("Tourist" = c(1,1,2,2,3,4), "Country" = c("France","Spain","France","Spain","France","France"), "Tour_Rating" = c(5,4,3,5,7,4), "Tour_Year" = c(2021,2020,2016,2017,2018,2021))

I assume that I would use the reshape package's melt function to do this table conversion, but I am not sure how to do so properly. My attempts so far (melting the data for one variable, then melting again for another) have produced tables with two Country columns, and sometimes France and Spain occupy the same row - so I know I've done it wrong.

Is there a straightforward method for how to proceed?

sachem87
  • 59
  • 1
  • 8
  • Also see https://stackoverflow.com/q/63973128/5325862, https://stackoverflow.com/q/66499564/5325862, https://stackoverflow.com/q/59111671/5325862 – camille Jul 29 '21 at 19:56

1 Answers1

3

Consider using pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
Tourism_data_current %>%
     pivot_longer(cols = -Tourist, names_to = c("Country", ".value"), 
       names_sep="_", values_drop_na = TRUE) %>%
     rename_with(~ str_c('Tour_', .), Rating:Year)

-output

# A tibble: 6 x 4
  Tourist Country Tour_Rating Tour_Year
    <int> <chr>         <dbl>     <dbl>
1       1 France            5      2021
2       1 Spain             4      2020
3       2 France            3      2016
4       2 Spain             5      2017
5       3 France            7      2018
6       4 France            4      2021
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This can now also be done with melt() function from data.table using a single command by passing a list of variable names to the "measure" argument. See here:https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html, "melt multiple columns simultaneously". – Zhaochen He Sep 27 '22 at 23:47