I am trying to convert my dataset from long to wide format only using base R packages and dplyr only due to university's project requirements. I have tried to use reshape function, but I have encountered problems with converting them using the reshape() function. My data looks like this please note that in original dataset in geo column there are 27 countries and dates that go from 2000-02-01 to 2022-09-01:
# A tibble: 8 × 3
geo time values
<chr> <date> <dbl>
1 Poland 2022-09-01 1.3
2 Germany 2022-09-01 2.1
3 Denmark 2022-09-01 3.2
4 Greece 2022-09-01 4.1
5 Poland 2022-08-01 1.1
6 Germany 2022-08-01 2.7
7 Denmark 2022-08-01 3.8
8 Greece 2022-08-01 4.12
I want to convert that data into this format :
# A tibble: 4 × 5
time Poland Germany Denmark Greece
<date> <dbl> <dbl> <dbl> <dbl>
1 2022-09-01 1 2 3 4
2 2022-08-01 2.1 3.3 4.5 5.7
3 2022-07-01 7 8 5 8
4 2022-06-01 21.2 2.3 1.2 2.13
You can reproduce these two datasets with this code
Long data:
library(dplyr)
datasample <- matrix(c("Poland","Germany","Denmark","Greece","Poland","Germany","Denmark","Greece",'2022-09-01','2022-09-01','2022-09-01','2022-09-01','2022-08-01','2022-08-01','2022-08-01','2022-08-01',1.3,2.1,3.2,4.1,1.1,2.7,3.8,4.12),
ncol = 3,
byrow = FALSE)
colnames(datasample) <- c("geo","time","values")
data_tibble <- as_tibble(datasample)
data_tibble <- data_tibble %>% mutate(geo = as.character(geo),
time = as.Date(time),
values = as.double(values))
Wide data:
data_wide <- matrix(c("2022-09-01",1,2,3,4,"2022-08-01",2.1,3.3,4.5,5.7,"2022-07-01",7,8,5,8,"2022-06-01",21.2,2.3,1.2,2.13),ncol=5, byrow=TRUE)
colnames(data_wide) <- c("time", "Poland","Germany","Denmark","Greece")
data_wide <- as_tibble(data_wide)
data_wide <- data_wide %>%
mutate(time = as.Date(time),
Poland = as.double(Poland),
Germany = as.double(Germany),
Denmark = as.double(Denmark),
Greece = as.double(Greece))
I tried to use the simple reshape function from base R library:
data_wide <- reshape(original_data, idvar = "time", timevar = "geo", direction = "wide")
and I get something like this:
# A tibble: 272 × 2
time values.c("Austria", "Belgium", "Bulgaria", "Cyprus", "Czech Republi…¹
<date> <dbl>
1 2022-09-01 NA
2 2022-08-01 NA
3 2022-07-01 NA
4 2022-06-01 NA
5 2022-05-01 NA
6 2022-04-01 NA
7 2022-03-01 NA
8 2022-02-01 NA
9 2022-01-01 NA
10 2021-12-01 NA
This is something close to the result that I want but as you can see it is now divided into 2 columns there should be time and seperate columns for each country but it is now a singular vector with country names.
How can i form that vector into seperate columns and insert the values to appropriate records?