I'm trying to use the following code but it gives error
01jan1986
05jan2001
07mar1983
and so on I need to get the exact age of them
gen agecat=1
if age 0-20==1
if age 21-40==2
if age 41-60==3
if age 61-64==4```
I'm trying to use the following code but it gives error
01jan1986
05jan2001
07mar1983
and so on I need to get the exact age of them
gen agecat=1
if age 0-20==1
if age 21-40==2
if age 41-60==3
if age 61-64==4```
Here's one way:
gen age_cat = cond(age <= 20, 1, cond(age <= 40, 2, cond(age <= 60, 3, cond(age <= 64, 4, .))))
You might also want to look into egen, cut
, see help egen
.
To build off of Wouter's answer, you could do something like this to calculate the age to the tenth of a year:
clear
set obs 12
set seed 12352
global today = date("18Jun2021", "DMY")
* Sample Data
gen dob = runiformint(0,17000) // random Dates
format dob %td
* Create Age
gen age = round((ym(year(${today}),month(${today})) - ym(year(dob), month(dob)))/ 12,0.1)
* Correct age if dob in current month, but after today's date
replace age = age - 0.1 if (month(${today}) == month(dob)) & (day(dob) > day(${today}))
* age category
gen age_cat = cond(age <= 20, 1, cond(age <= 40, 2, cond(age <= 60, 3, cond(age <= 64, 4, .))))
The penultimate step is important as it decrements the age if their DOB is in the same month as the comparison date but has yet to be realised.
+----------------------------+
| dob age age_cat |
|----------------------------|
1. | 30jan2004 17.4 1 |
2. | 14aug1998 22.8 2 |
3. | 06aug1998 22.8 2 |
4. | 31aug1994 26.8 2 |
5. | 27mar1990 31.3 2 |
|----------------------------|
6. | 12jun1968 53 3 |
7. | 05may1964 57.1 3 |
8. | 06aug1994 26.8 2 |
9. | 21jun1989 31.9 2 |
10. | 10aug1984 36.8 2 |
|----------------------------|
11. | 22oct2001 19.7 1 |
12. | 03may1972 49.1 3 |
+----------------------------+
Note that the decimal is just approximate as it uses the month of the birthday and not the actual date.
You got some good advice in other answers, but this can be as simple as you want.
Consider this example, noting that presenting data as code we can run is a really helpful detail.
* Example generated by -dataex-. For more info, type help dataex
clear
input str9 sdate float dob
"01jan1986" 9497
"05jan2001" 14980
"07mar1983" 8466
end
format %td dob
The age at end 2020 is just 2020 minus the year people were born. Use any other year if it makes more sense.
. gen age = 2020 - year(dob)
. l
+-----------------------------+
| sdate dob age |
|-----------------------------|
1. | 01jan1986 01jan1986 34 |
2. | 05jan2001 05jan2001 19 |
3. | 07mar1983 07mar1983 37 |
+-----------------------------+
For 20 year bins, why not make them self-describing. Thus with this code, 20, 40 etc. are the upper limit of each bin. (You might need to tweak that if you have children under 1 year old in your data.)
. gen age2 = 20 * ceil(age/20)
. l
+------------------------------------+
| sdate dob age age2 |
|------------------------------------|
1. | 01jan1986 01jan1986 34 40 |
2. | 05jan2001 05jan2001 19 20 |
3. | 07mar1983 07mar1983 37 40 |
+------------------------------------+
This paper is a review of rounding and binning using Stata.