I have two data frame in R . first one contains some information about family and the other contains some information about all children of family the first one like:(HHchar data frame)
HHchar <- read.table(text="ID familysize
1 4
2 5
3 2
4 3",header=T)
the second one likes:(children data frame)
children <- read.table(text="ID age gender birthorder
1 26 1 firstchild
1 20 2 secondchild
2 20 1 firstchild
2 18 1 firstchild
2 17 2 secondchild
2 10 1 thirdchild
3 19 1 firstchild
3 12 2 secondchild
4 10 1 firstchild",header=T)
I want this as a result:
ID age gender birthorder familysize
1 26 1 firstchild 4
1 20 2 secondchild 4
2 20 1 firstchild 5
2 18 1 secondchild 5
2 17 2 thirdchild 5
3 19 1 firstchild 4
3 12 2 secondchild 4
4 10 1 firstchild 3
For this result I use this command:
b2 <- merge (children ,HHchar,by="ID", all.x= TRUE)
I think by this command all information about people which is reported in children data frame will extracted from HHchar data frame too and b2 data frame will be build.
But something bad was happen and I see number of rows in b2 is not what I was expected.
For example:
- in HHchar we have 4 family and so 4 rows
- in children data frame we have 9 children and 9 rows
- in b2 I anticipate 9 rows but it exceed from 9
I test this problem by small dataset but my dataset is very large (853467 rows).
So I can't test what is wrong ? Is merge command suitable for my need ?