0
hhid psid year
1 1 1989
1 1 1991
1 1 1993
1 1 2000
1 2 1989
1 2 1991
1 2 1993
1 2 2000
2 1 1989
2 1 1991
2 1 1993
2 1 2000

... ... ...

hhid=household ID and psid=personal ID within a household.And my question is how to create a personal ID (say uid) applied to the whole panel dataset that looks like:

hhid psid year uid
1 1 1989 1
1 1 1991 1
1 1 1993 1
1 1 2000 1
1 2 1989 2
1 2 1991 2
1 2 1993 2
1 2 2000 2
2 1 1989 3
2 1 1991 3
2 1 1993 3
2 1 2000 3

In stata i just do this egen uid = group(hhid psid)

Kara
  • 6,115
  • 16
  • 50
  • 57
user2898054
  • 31
  • 1
  • 2

1 Answers1

0

Here's a way, assuming that your data.frame is called df:

df$uid <- as.numeric(as.factor(paste(df$hhid, df$psid, sep=' ')))

This works because factors are internally represented as integers, each factor level has a different integer value. as.numeric gets you those integers, which are unique to the factor level by design.

Alternatively, you can use dplyr's group_indices() function.

RoyalTS
  • 9,545
  • 12
  • 60
  • 101