0

I am new in R . I have a data frame containing 3 columns. first one shows ID , for each household we have a uniqe ID. the other columns shows relationship(1 for father , 2 for mother and 3 for children . third columns shows their age. now i want to know how many twins are there in each family. ( twins are childs that have same age in each family) my data frame:

Id     relationship       age
1001       1              60 
1001       2              50
1001       3              20
1002       1              70
1002       2              68
1002       3              23
1002       3              27
1002       3              27
1002       3              23
1003       1              60
1003       2              40
1003       3              20
1003       3              20

result:

id                   twins
1001                    0
1002                    2
1003                    1
ekad
  • 14,436
  • 26
  • 44
  • 46
user3041372
  • 31
  • 1
  • 4

3 Answers3

2

Here's an R base solution using aggregate

> aggregate(age ~ Id, function(x) sum(duplicated(x)), data=df[df[,2]==3, ])
    Id age
1 1001   0
2 1002   2
3 1003   1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
0

It's a little difficult to attempt these without a working example. You can use dput() to create one. ... but I think this should work.

    library(plyr)
    df= df[df$relationship==3,]
    ddply(df, .(id,age), nrow)

or rather it gives the number of children (not just twins)

Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
0
almost <- ddply(df[df$relationship==3,], .(Id,age), function(x) nrow(x)-1)

aggregate(almost$V1, list(almost$Id), FUN =sum )
#  Group.1 x
#1    1001 0
#2    1002 2
#3    1003 1
user1317221_G
  • 15,087
  • 3
  • 52
  • 78