Imagine I have asked four tourists to rate their experiences in two countries, France and Spain, and to tell me the years they went there. I could put their data in a table like this:
Tourism_data_current = tibble("Tourist" = c(1:4), "France_Rating" = c(5,3,7,4), "France_Year" = c(2021, 2016, 2018, 2021), "Spain_Rating" = c(4,5,NA,NA), "Spain_Year" = c(2020,2017, NA, NA))
Returning:
A tibble: 4 x 5
Tourist France_Rating France_Year Spain_Rating Spain_Year
<int> <dbl> <dbl> <dbl> <dbl>
1 1 5 2021 4 2020
2 2 3 2016 5 2017
3 3 7 2018 NA NA
4 4 4 2021 NA NA
Now if I decide I want this data to be long rather than wide so I can evaluate the relationship between rating and year as a function of country, I do not know how to proceed.
The ideal table would look like this (I assume):
A tibble: 6 x 4
Tourist Country Tour_Rating Tour_Year
<dbl> <chr> <dbl> <dbl>
1 1 France 5 2021
2 1 Spain 4 2020
3 2 France 3 2016
4 2 Spain 5 2017
5 3 France 7 2018
6 4 France 4 2021
This target table was generated manually from the following code:
Tourism_data_ideal = tibble("Tourist" = c(1,1,2,2,3,4), "Country" = c("France","Spain","France","Spain","France","France"), "Tour_Rating" = c(5,4,3,5,7,4), "Tour_Year" = c(2021,2020,2016,2017,2018,2021))
I assume that I would use the reshape package's melt
function to do this table conversion, but I am not sure how to do so properly. My attempts so far (melting the data for one variable, then melting again for another) have produced tables with two Country columns, and sometimes France and Spain occupy the same row - so I know I've done it wrong.
Is there a straightforward method for how to proceed?