I have a problem, that I can't seem to solve, despite having tried using tidyverse functions pivot_longer and pivot_widerm but I cannot seem to extract the year variable stored in a row below variable names.
The image above shows a screenshot from the data, the working example is identical, except that variable names have been translated into English.
Here is a minimal working example
library(tidyverse)
library(janitor)
file <- read_delim("C:/Users/Probst/Downloads/testdata.csv",
delim = ";",
escape_double = FALSE,
trim_ws = TRUE,
col_types = list(col_double()),
locale = locale(decimal_mark = ",",
grouping_mark = ".")) |>
clean_names() |>
select(-raumeinheit, # not interesting
-aggregat) # not interesting
The raw data (testdata.csv) is as follows
spatialid;Raumeinheit;Aggregat;pop_male;pop_male;pop_male;avg_pop_age; avg_pop_age;avg_pop_age
;;;2017;2018;2019;2017;2018;2019
"01001";"Flensburg, Stadt";"kreisfreie Stadt";"44.086";"44.599";"44.904";"42,17";"42,03";"42,00"
"01002";"Kiel, Stadt";"kreisfreie Stadt";"120.809";"120.566";"120.198";"41,53";"41,59";"41,72"
"02000";"Hamburg, Stadt";"kreisfreie Stadt";"897.207";"902.048";"903.974";"41,67";"41,67";"41,66"
(read_delim(file=I(...) unfortunately did not work for a cleaner working example.)
You will get the same file, with the following code:
file2 <- tribble(~spatialid, ~pop_male_4, ~pop_male_5, ~pop_mal_6, ~avg_pop_age_7, ~avg_pop_age_8, ~avg_pop_age_9,
NA, 2017, 2018, 2019, 2017, 2018, 2019,
1001, 44086, 44599, 44904, 42.2, 42.0, 42,
1002, 120809, 120566, 120198, 41.5, 41.6, 41.7,
2000, 897207, 902048, 903974, 41.7, 41.7, 41.7)
pop_male is the male population for each spatial unit and avg_pop_age is the average age for each unit. The problem with the dataset is, that the year variable is stored in the row below the variable names, which I cannot seem to extract.
What I would like to end up with a "tidy" data frame, which would look like this:
# rouding errors
desired_result <-
tribble(~spatial_id, ~year, ~pop_male, ~avg_pop_age,
1001, 2017, 44086, 42.2,
1001, 2018, 44599, 42.0,
1001, 2019, 44904, 42,
1002, 2017, 120809, 41.5,
1002, 2018, 120566, 41.6,
1002, 2019, 120198, 41.7,
2000, 2017, 897207, 41.7,
2000, 2018, 902048, 41.7,
2000, 2019, 903974, 41.7)
Any help or hint is highly appreciated.