The input data offer several challenges:
- Data are given as a straight character vector, not a as data.frame with predefined columns.
- The rows partly consist of key/value pairs which are separated by
": "
- The other rows act as section headers. All key/value pairs in the rows below belong to one section until the next header is reached.
The code below is relying only on two assumptions:
- key/value pairs contain one and only one
": "
- section headers none at all.
Multiple keys in a section, e.g., multiple rows with email adresses are handled by specifying toString()
as aggregation function to dcast()
.
library(data.table)
# coerce to data.table
data.table(text = txt)[
# split key/value pairs in columns
, tstrsplit(text, ": ")][
# pick section headers and create new column
is.na(V2), Name := V1][
# fill in Name into the rows below
, Name := zoo::na.locf(Name)][
# reshape key/value pairs from long to wide format using Name as row id
!is.na(V2), dcast(.SD, Name ~ V1, fun = toString, value.var = "V2")]
Name City/Town Email
1: Name1 Location1 email1@xyz.com
2: Name2 Location2 email2@abc.com
3: Name3 Location3 email3@pqr.com
4: Name4 Location4 NA
5: Name5 NA email5@abc.com
Data
txt <- c("Name1", "Email: email1@xyz.com", "City/Town: Location1", "Name2",
"Email: email2@abc.com", "City/Town: Location2", "Name3", "Email: email3@pqr.com",
"City/Town: Location3", "Name4", "City/Town: Location4", "Name5",
"Email: email5@abc.com")
Or, try somewhat more "realistic" names
txt1 <- c("John Doe", "Email: email1@xyz.com", "City/Town: Location1", "Save the World Fund",
"Email: email2@abc.com", "City/Town: Location2", "Best Shoes Ltd.", "Email: email3@pqr.com",
"City/Town: Location3", "Mother", "City/Town: Location4", "Jane",
"Email: email5@abc.com")
which will result in:
Name City/Town Email
1: Best Shoes Ltd. Location3 email3@pqr.com
2: Jane NA email5@abc.com
3: John Doe Location1 email1@xyz.com
4: Mother Location4 NA
5: Save the World Fund Location2 email2@abc.com
Or, with multiple keys per section
txt2 <- c("John Doe", "Email: email1@xyz.com", "Email: email1@abc.com", "City/Town: Location1", "Save the World Fund",
"Email: email2@abc.com", "City/Town: Location2", "Best Shoes Ltd.", "Email: email3@pqr.com",
"City/Town: Location3", "Mother", "City/Town: Location4", "City/Town: everywhere", "Jane",
"Email: email5@abc.com")
Name City/Town Email
1: Best Shoes Ltd. Location3 email3@pqr.com
2: Jane email5@abc.com
3: John Doe Location1 email1@xyz.com, email1@abc.com
4: Mother Location4, everywhere
5: Save the World Fund Location2 email2@abc.com