After parsing xml files I have data looking like this:
example_df <-
tibble(id = "ABC",
wage_type = "salary",
name = c("Description","Code","Base",
"Description","Code","Base",
"Description","Code"),
value = c("wage_element_1","51B","600",
"wage_element_2","51C","740",
"wage_element_3","51D"))
example_df
# A tibble: 8 x 4
id wage_type name value
<chr> <chr> <chr> <chr>
1 ABC salary Description wage_element_1
2 ABC salary Code 51B
3 ABC salary Base 600
4 ABC salary Description wage_element_2
5 ABC salary Code 51C
6 ABC salary Base 740
7 ABC salary Description wage_element_3
8 ABC salary Code 51D
with roughly 1000 different id
, and each having three possible values for wage_type
.
I want to change the values in the name
column into columns.
I have tried to use pivot
but I am struggling to handle the resulting list-cols
: since not all salary
have a Base
, the resulting list-cols are of different size as below:
example_df <- example_df %>%
pivot_wider(id_cols = c(id, wage_type),
names_from = name,
values_from = value)
example_df
# A tibble: 1 x 5
id wage_type Description Code Base
<chr> <chr> <list> <list> <list>
1 ABC salary <chr [3]> <chr [3]> <chr [2]>
So when I try to unnest the cols it throws an error:
example_df%>%
unnest(cols = c(Description,Code,Base))
Error: Can't recycle `Description` (size 3) to match `Base` (size 2).
I understand that is because tidyr functions do not recycle, but I could not find a way around this or a base r
solution to my problem. I have tried to make a df with
unlist(strsplit(as.character(x))
solution as per how to split one row into multiple rows in R but also ran into a resulting column length issue.
Desired output is as follows:
desired_df <-
tibble(
id=c("ABC","ABC","ABC"),
wage_type=c("salary","salary","salary"),
Description = c("wage_element_1","wage_element_2","wage_element_3"),
Code = c("51B","51C","51D"),
Base = c("600","740",NA))
desired_df
id wage_type Description Code Base
<chr> <chr> <chr> <chr> <chr>
1 ABC salary wage_element_1 51B 600
2 ABC salary wage_element_2 51C 740
3 ABC salary wage_element_3 51D NA
I would love a tidyr solution but any help would be appreciated. Thanks.