I have a dataframe containing character columns. The first column (V1) contains IDs and is followed by multiple columns containing strings with numbers, letters and symbols. What i would like is to extract all numeric values and symbols until there is a space in the string. Ideally, i would want all numeric vales and symbols found in column V2, to be written to a new column with a ";" seperator.
df2 <- structure(list(V1 = c("00094", "00001", "00002", "00003", "00004",
"00005", "00006", "00007", "00008", "00009"), V2 = c("4-6-2021 (aw), vaccinatie naam en data aangepast 19-8-2021 kv: ic nog niet ontvangen nav eerdere",
NA, "23-7 mf: t2-vragenlijst omgezet naar t3, verzoek bij alienke om t2 af te keuren 6-12 mf: corona",
NA, NA, "13-12 mf: 3 maanden na 2e vaccinatie corona", "20-7 mf: vaccinatiedatum blijkt enige vaccinatie 6-12 mf: corona",
NA, NA, "15-7-2021 kv: corona gehad in maand05 2021, dus één vaccinatie. 6-12 mf: corona"
), V3 = c("eerdere brief. mf/sp telefonisch contact laten opnemen. 19-8 mf: gg, herinneringsmail gestuurd, komt niet a",
NA, "corona gehad, 1 vaccinatie, per mail", NA, NA, "corona gekregen, per mail",
"corona gehad, 1 vaccinatie, per mail", NA, NA, NA)), row.names = c(NA,
10L), class = "data.frame")
This would be the desired output (column names are not important):
df2_new <- structure(list(V1 = c("00094", "00001", "00002", "00003", "00004",
"00005", "00006", "00007", "00008", "00009"), V2 = c("4-6-2021 (aw), vaccinatie naam en data aangepast 19-8-2021 kv: ic nog niet ontvangen nav eerdere",
NA, "23-7 mf: t2-vragenlijst omgezet naar t3, verzoek bij alienke om t2 af te keuren 6-12 mf: corona",
NA, NA, "13-12 mf: 3 maanden na 2e vaccinatie corona", "20-7 mf: vaccinatiedatum blijkt enige vaccinatie 6-12 mf: corona",
NA, NA, "15-7-2021 kv: corona gehad in maand05 2021, dus één vaccinatie. 6-12 mf: corona"
), V3 = c("eerdere brief. mf/sp telefonisch contact laten opnemen. 19-8 mf: gg, herinneringsmail gestuurd, komt niet a",
NA, "corona gehad, 1 vaccinatie, per mail", NA, NA, "corona gekregen, per mail",
"corona gehad, 1 vaccinatie, per mail", NA, NA, NA), `dates V2` = c("4-6-2021;19-8-2021",
NA, "23-7;6-12", NA, NA, "13-12", "20-7;6-12", NA, NA, "15-7-2021;6-12"
), `dates V3` = c("19-8", NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
Thanks so much!