-1

I have a 9 variables discussing 'who the ID lives with'. There are over 3000 ID variables, the first four columns are 'mother, step-mother, father and step-father',

dummyid yafam01 yafam02 yafam03 yafam04  
    <dbl>   <dbl>   <dbl>   <dbl>   <dbl> 
1  100001       1       2       1       2 
2  100004       1       2       2       2 
3  100007       1       2       2       1 
4  100010       1       2       1       2 
5  100016       1       2       1       2 
6  100019       1       2       1       2

1 means ticked and 2 meaning not ticked

I need to create a new variable that establishes who they live with. Across the dataset the 1s and 2s mean 'ticked' and 'not ticked' (in theory yes/no) So I need one new variable that assigns if they live with 'mother, father, both, step mother, step father, or both step mother and father'

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Anon
  • 11
  • 2
  • Hi Anon and welcome to SO. Can you post a minimal sample of your data using `dput()`? Also show expected output. Relevant post: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – markus May 05 '20 at 21:23
  • Please post that as part of the question. Not as a comment. Also use `dput()` and show expected output. This helps others help you. – markus May 05 '20 at 21:30
  • Take a look at the `?interaction` function - it makes a single variable that is a combination of the inputs. E.g. `interaction(dat[c("yafam01", "yafam02", "yafam03", "yafam04")])` – thelatemail May 05 '20 at 22:09
  • I have had a look at the interaction code in which for the first row I would get 1.2.1.2 - How do create this into a nominal factor variable for all the catergories, do I have to assign each interaction a new name and then apply to the new column? – Anon May 06 '20 at 12:26

1 Answers1

0

I have made some dummy data in the format you described:

data = read.delim("family.txt", row.names = 1, sep = " ")
       m sm f sf
100001 1  2 1  2
100002 1  1 1  2
100003 1  1 2  2
100004 2  2 1  2
100005 2  2 2  1

Then I converted these to logical values (easier to work with):

data = apply(data - 1, 2, as.logical) # Convert to true/false

I then just create a look-up vector with the family embers and run an apply loop to subset this based on the data:

lives_with = c('mother', 'step-mother', 'father', 'step-father')
family_encoding = apply(data, 1, function(row){
    paste(lives_with[row], collapse = ":")
})

This yields a vector of encodings which you can cast to a factor:

> factor(family_encoding)
[1] step-mother:step-father        step-father                   
[3] father:step-father             mother:step-mother:step-father
[5] mother:step-mother:father     
5 Levels: father:step-father ... step-mother:step-father
randr
  • 255
  • 1
  • 7
  • Thank you! But I need to add this has a new variable to the dataset where 'm,sm,f and sf' are under the new variable. Does your method do this? – Anon May 05 '20 at 22:00
  • Additionally, this just changes all the numeric vectors to either True or False – Anon May 05 '20 at 22:04
  • I am not sure I understand. The `family_encoding` variable can be added to the dataset with `cbind`. Is that what you mean? It might be easiest if you post an example of how the output should look. – randr May 05 '20 at 22:09
  • Yes, used the interaction() code and then simply renamed the levels using levels function . using the interaction for the first ID It was recoded 1.2.1.2, then i created a code book of the relationship i wanted and introduced the new levels – Anon May 11 '20 at 15:59