Creating a new variable from multiple binary categories in R

Question

I have a 9 variables discussing 'who the ID lives with'. There are over 3000 ID variables, the first four columns are 'mother, step-mother, father and step-father',

dummyid yafam01 yafam02 yafam03 yafam04  
    <dbl>   <dbl>   <dbl>   <dbl>   <dbl> 
1  100001       1       2       1       2 
2  100004       1       2       2       2 
3  100007       1       2       2       1 
4  100010       1       2       1       2 
5  100016       1       2       1       2 
6  100019       1       2       1       2

1 means ticked and 2 meaning not ticked

I need to create a new variable that establishes who they live with. Across the dataset the 1s and 2s mean 'ticked' and 'not ticked' (in theory yes/no) So I need one new variable that assigns if they live with 'mother, father, both, step mother, step father, or both step mother and father'

Hi Anon and welcome to SO. Can you post a minimal sample of your data using `dput()`? Also show expected output. Relevant post: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — markus, May 05 '20 at 21:23
Please post that as part of the question. Not as a comment. Also use `dput()` and show expected output. This helps others help you. — markus, May 05 '20 at 21:30
Take a look at the `?interaction` function - it makes a single variable that is a combination of the inputs. E.g. `interaction(dat[c("yafam01", "yafam02", "yafam03", "yafam04")])` — thelatemail, May 05 '20 at 22:09
I have had a look at the interaction code in which for the first row I would get 1.2.1.2 - How do create this into a nominal factor variable for all the catergories, do I have to assign each interaction a new name and then apply to the new column? — Anon, May 06 '20 at 12:26

score 0 · Answer 1 · answered May 05 '20 at 21:35

0

I have made some dummy data in the format you described:

data = read.delim("family.txt", row.names = 1, sep = " ")

       m sm f sf
100001 1  2 1  2
100002 1  1 1  2
100003 1  1 2  2
100004 2  2 1  2
100005 2  2 2  1

Then I converted these to logical values (easier to work with):

data = apply(data - 1, 2, as.logical) # Convert to true/false

I then just create a look-up vector with the family embers and run an apply loop to subset this based on the data:

lives_with = c('mother', 'step-mother', 'father', 'step-father')
family_encoding = apply(data, 1, function(row){
    paste(lives_with[row], collapse = ":")
})

This yields a vector of encodings which you can cast to a factor:

> factor(family_encoding)
[1] step-mother:step-father        step-father                   
[3] father:step-father             mother:step-mother:step-father
[5] mother:step-mother:father     
5 Levels: father:step-father ... step-mother:step-father

answered May 05 '20 at 21:35

randr

255
1
7

Thank you! But I need to add this has a new variable to the dataset where 'm,sm,f and sf' are under the new variable. Does your method do this? – Anon May 05 '20 at 22:00
Additionally, this just changes all the numeric vectors to either True or False – Anon May 05 '20 at 22:04
I am not sure I understand. The `family_encoding` variable can be added to the dataset with `cbind`. Is that what you mean? It might be easiest if you post an example of how the output should look. – randr May 05 '20 at 22:09
Yes, used the interaction() code and then simply renamed the levels using levels function . using the interaction for the first ID It was recoded 1.2.1.2, then i created a code book of the relationship i wanted and introduced the new levels – Anon May 11 '20 at 15:59

Creating a new variable from multiple binary categories in R

1 Answers1