I have a large dataset in the form shown below:
ID | Scores |
---|---|
1 | English 3, French 7, Geography 8 |
2 | Spanish 7, Classics 4 |
3 | Physics 5, English 5, PE 7, Art 4 |
I need to parse the text string from the Scores column into separate columns for each subject with the scores for each individual stored as the data values, as below:
ID | English | French | Geography | Spanish | Classics | Physics | PE | Art |
---|---|---|---|---|---|---|---|---|
1 | 3 | 7 | 8 | - | - | - | - | - |
2 | - | - | - | 7 | 4 | - | - | - |
3 | 5 | - | - | - | - | 5 | 7 | 4 |
I cannot manually predefine the columns as there are 100s in the full dataset. So far I have cleaned the data to remove inconsistent capitalisation and separated each subject-mark pairing into a distinct column as follows:
df$scores2 <- str_to_lower(df$Scores)
split <- separate(
df,
scores2,
into = paste0("Subject", 1:8),
sep = "\\,",
remove = FALSE,
convert = FALSE,
extra = "warn",
fill = "warn",
)
I have looked at multiple questions on the subject, such as Split irregular text column into multiple columns in r, but I cannot find another case where the column titles and data values are mixed in the text string. How can I generate the full set of columns required and then populate the data value?