Parsing out data from one variable using complex rules in R

Question

I am importing data into R from another source (i.e., I cannot easily change the in-coming format/values).

Among the variables is one that include one or more of these possible values:

Mother (biological mother, foster mother, step mother, etc.)
Father (biological father, foster father, step father, etc.)
Grandparent(s) (biological, foster, step, etc.)
Brother(s) older than 18
Sister(s) older than 18
Other adults (aunts, uncles, etc.)

all within the same "cell" so that possible data look like:

Sample Input Data Frame (df)

df <- read.table(text =
"row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

Within R, how could I efficiently create rules to parse out these responses into separate columns, one column for each type of family member, so that the output would look like this:

Sample Output Data Frame

mother <- c(1,0,1,1)
father <- c(1,0,0,1)
adult.brother <- c(1,0,0,0)
adult.sister <- c(1,0,1,0)
grandparent <- c(1,0,0,0)
other.adult <- c(1,0,0,0)
output.df <- cbind(mother, father, adult.brother, adult.sister, grandparent, other.adult)
colnames(output.df) <- c("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
output.df

     Mother Father Brother Sister Grandparent Other adult
[1,]      1      1       1      1           1           1
[2,]      0      0       0      0           0           0
[3,]      1      0       0      1           0           0
[4,]      1      1       0      0           0           0

TIA

This is easy to do and best demonstrated using specific sample data. Please [edit](https://stackoverflow.com/posts/51814298/edit) your post to include representative and copy&paste-able sample data. — Maurits Evers, Aug 13 '18 at 02:04
No follow-up after 2 hours and no update means a down-vote from me (will remove if you edit/revise your question). Generally we expect you to hang around after posting a question to respond to any questions/comments. I can see that you've been checking SO on and off over the last 2 hours, so if you like to get help please spend some time adding critical information to your post. — Maurits Evers, Aug 13 '18 at 04:16
Thank you for the guidance. I've edited it to try to make it clearer. — wes, Aug 13 '18 at 04:23
Much better (down-vote removed); I've added a solution below that should get you started; please take a look. — Maurits Evers, Aug 13 '18 at 04:45
Wonderful! `tidyverse` is works wonderfully (after installing dependencies like `libcurl4` in Ubuntu and a learning curve). I'll further abuse my noob status to heartily thank both you and @Suhas for the help and the warm welcome to Stack Overflow. — wes, Aug 13 '18 at 05:55
Great, glad it worked out. Welcome to SO and do stick around:-) — Maurits Evers, Aug 13 '18 at 07:25

score 1 · Answer 1 · answered Aug 13 '18 at 04:23

Hey welcome to Stack Overflow! Here are some links on how to ask better questions on Stack Overflow so that it makes it easy for people to help you(going forward).

Coming to your question, I made some assumptions and tried to solve it. As Maurits has mentioned you need to provide a reproducible example so that someone can give a concrete answer, until then this is the best I can come up with.

library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))

# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)

# print 
test_object

# A tibble: 2 x 3
  person mother     father    
   <dbl> <list>     <list>    
1      1 <list [3]> <list [3]>
2      2 <list [3]> <list [3]>

# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
  unnest(.) %>%
    gather(data = ., key = relationship, value = name, -person) %>%
      unnest() -> test_object

    test_object
# A tibble: 12 x 3
   person relationship name    
    <dbl> <chr>        <chr>   
 1      1 mother       bio_1   
 2      1 mother       step_1  
 3      1 mother       foster_1
 4      2 mother       bio_2   
 5      2 mother       stp_2   
 6      2 mother       foster_2
 7      1 father       bio_1   
 8      1 father       foster_1
 9      1 father       other_1 
10      2 father       bio_2   
11      2 father       stp_2   
12      2 father       foster_2

Here are links to tidyverse and data.table that contain a lot packages and functions to solve most of your data-carpentry/wrangling issues.

Excellent guidance and welcome. Thank you. The information on subsetting is very useful--a great addition even though my question wasn't clear. — wes, Aug 13 '18 at 05:49

score 1 · Accepted Answer · answered Aug 13 '18 at 04:43

Here is a tidyverse option that should get you started

library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
#  row Mother Father Brother Sister Grandparent Other adult
#1   1      1      1       1      1           1           1
#2   2      0      0       0      0           0           0
#3   3      1      0       0      1           0           0
#4   4      1      1       0      0           0           0

Sample data

df <- read.table(text =
    "row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

score 1 · Answer 3 · answered Aug 13 '18 at 06:45

1

Try this:

rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")

for(i in 1:6){
  df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
  colnames(df)[i+2]<-rel[[i]]
}

answered Aug 13 '18 at 06:45

Ankur

141
10

Parsing out data from one variable using complex rules in R

3 Answers3

Sample data