1

I have two data frame in R . first one contains some information about family and the other contains some information about all children of family the first one like:(HHchar data frame)

HHchar <- read.table(text="ID familysize
1      4
2      5
3      2
4      3",header=T)

the second one likes:(children data frame)

children <- read.table(text="ID   age   gender birthorder
1     26     1    firstchild
1     20     2    secondchild
2     20     1    firstchild 
2     18     1    firstchild
2     17     2    secondchild
2     10     1    thirdchild
3     19     1    firstchild
3     12     2    secondchild
4     10     1    firstchild",header=T)

I want this as a result:

ID    age    gender     birthorder       familysize
1     26       1        firstchild           4
1     20       2        secondchild          4
2     20       1        firstchild           5
2     18       1        secondchild          5
2     17       2        thirdchild           5
3     19       1        firstchild           4
3     12       2        secondchild          4
4     10       1        firstchild           3

For this result I use this command:

b2 <- merge (children ,HHchar,by="ID", all.x= TRUE)

I think by this command all information about people which is reported in children data frame will extracted from HHchar data frame too and b2 data frame will be build.
But something bad was happen and I see number of rows in b2 is not what I was expected.

For example:

  • in HHchar we have 4 family and so 4 rows
  • in children data frame we have 9 children and 9 rows
  • in b2 I anticipate 9 rows but it exceed from 9

I test this problem by small dataset but my dataset is very large (853467 rows).

So I can't test what is wrong ? Is merge command suitable for my need ?

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
user3041372
  • 31
  • 1
  • 4
  • you can access to the function documentation using: ?merge. There are examples there. [Here is a popular question regarding what you need with more methods similar to merge](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right). – marbel Jan 23 '14 at 17:17
  • I don't really understand what do you want, for example in your desired output there is a missing row from the children data frame. But if you want to get some results after merging you can use `subset`. (I think that merge works ok, just that you didn't expect this result.) – llrs Jan 23 '14 at 17:39
  • It seems that you're missing one ID=2 row from your result, is this on purpose? `merge(children,HHchar)` is all that is necessary from my understanding of your question. – Brandon Bertelsen Jan 23 '14 at 17:59
  • When I run your code on your data, I get 9 rows i b2. – jlhoward Jan 23 '14 at 18:58

1 Answers1

1

If I understood your question correctly, this should work:

result <- merge(children, HHchar, by = "ID") 

In general, the functions works as: merge(firstDF, secondDF, by.x = ColumnToJoinOnInFirstDF, by.y = ColumnToJoinOnInSecondDF)

so13eit
  • 942
  • 3
  • 11
  • 22
  • no merge command don't work as I anticipate. and increase number of rows. I think by this command some additional rows add to dataset. do you ever see this problem? – user3041372 Jan 23 '14 at 16:55
  • Can you give a more concrete example that is reproducible with code? Your current example also doesn't make sense, as the output you want only contains 8 rows. – so13eit Jan 23 '14 at 16:59
  • This works fine for me. except the input in the original question might be wrong. You have ID 2 listed twice with firstchild and two different ages. That is why this gives 9 rows instead of 8. – JeremyS Jan 24 '14 at 01:19