So judging by your example data and sample dummy
formula, your goal is to identify which countries have a full time series as of 1952 (i.e., a complete and balanced panel); please correct me if this is off base. Your example data always satisfies this condition, so I will add one country that violates this to show what the dummy is identifying.
clear
input str10 country year poverty_rate Sales
"Austria" 1950 0.54 142
"Austria" 1951 0.32 12441
"Austria" 1952 0.32 12441
"Bangladesh" 1950 0.11 142123123
"Bangladesh" 1951 0.52 1234
"Bangladesh" 1952 0.32 12441
"Sri Lanka" 1950 0.95 4215
"Sri Lanka" 1951 0.21 142421
"Sri Lanka" 1952 0.32 12441
"Canada" 1950 0.95 4215
"Canada" 1951 0.21 .
"Canada" 1952 0.32 12441
end
* TSSET SET ON COUNTRY (after making a country id) AND YEAR
egen country_id = group(country)
tsset country_id year
* Example dummy
gen dummy= 1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)
+-------------------------------------------------------------+
| country year povert~e Sales countr~d dummy |
|-------------------------------------------------------------|
1. | Austria 1950 .54 142 1 . |
2. | Austria 1951 .32 12441 1 . |
3. | Austria 1952 .32 12441 1 1 |
4. | Bangladesh 1950 .11 1.421e+08 2 . |
5. | Bangladesh 1951 .52 1234 2 . |
|-------------------------------------------------------------|
6. | Bangladesh 1952 .32 12441 2 1 |
7. | Canada 1950 .95 4215 3 . |
8. | Canada 1951 .21 . 3 . |
9. | Canada 1952 .32 12441 3 . |
10. | Sri Lanka 1950 .95 4215 4 . |
|-------------------------------------------------------------|
11. | Sri Lanka 1951 .21 142421 4 . |
12. | Sri Lanka 1952 .32 12441 4 1 |
+-------------------------------------------------------------+
. tabdisp country if dummy == 1, c(year)
-----------------------
country | year
-----------+-----------
Austria | 1952
Bangladesh | 1952
Sri Lanka | 1952
-----------------------
Because Canada is missing sales in 1951, it does not have dummy == 1
.
Now let's look at what happens when we want to add more years. I will give Sri Lanka a missing sales year in addition to Canada. The general strategy will be to track the cumulative number of years, up to and including the current year, that had non-missing sales. Let's first make some example data:
* 10 year example Data
clear
set seed 1234
input str10 country
"Austria"
"Bangladesh"
"Sri Lanka"
"Canada"
end
egen country_id = group(country)
expand 10
bysort country: gen year = (1952 - _N ) + _n
gen poverty_rate = runiform(country_id/10, 1)
gen Sales = rnormal(10000 * country_id/10,500)
replace Sales = . if inlist(country, "Canada", "Sri Lanka") & mod(year, country_id + 3) == 0
tsset country_id year
. list
+------------------------------------------------------+
| country countr~d year poverty~e Sales |
|------------------------------------------------------|
1. | Austria 1 1943 .95250845 1247.2748 |
2. | Austria 1 1944 .14700104 868.84461 |
3. | Austria 1 1945 .97688645 1183.0619 |
4. | Austria 1 1946 .95117353 518.76747 |
5. | Austria 1 1947 .26708305 1126.6462 |
|------------------------------------------------------|
6. | Austria 1 1948 .95386004 2142.1245 |
7. | Austria 1 1949 .89428386 1161.1905 |
8. | Austria 1 1950 .94966985 797.04767 |
9. | Austria 1 1951 .18048327 689.44633 |
10. | Austria 1 1952 .77549004 1066.0907 |
|------------------------------------------------------|
11. | Bangladesh 2 1943 .95879865 3018.9243 |
12. | Bangladesh 2 1944 .28973012 2582.4464 |
13. | Bangladesh 2 1945 .58472512 2524.7572 |
14. | Bangladesh 2 1946 .9810758 2164.6962 |
15. | Bangladesh 2 1947 .30039802 2364.3507 |
|------------------------------------------------------|
16. | Bangladesh 2 1948 .81240204 2407.6086 |
17. | Bangladesh 2 1949 .22868748 2441.4124 |
18. | Bangladesh 2 1950 .25618875 1989.3041 |
19. | Bangladesh 2 1951 .36814293 2479.9563 |
20. | Bangladesh 2 1952 .72928052 2302.7052 |
|------------------------------------------------------|
21. | Canada 3 1943 .44079668 2673.9421 |
22. | Canada 3 1944 .9912414 . |
23. | Canada 3 1945 .50897682 1960.373 |
24. | Canada 3 1946 .92788352 3217.6927 |
25. | Canada 3 1947 .35683626 3401.9663 |
|------------------------------------------------------|
26. | Canada 3 1948 .76214979 3976.6976 |
27. | Canada 3 1949 .7398694 2350.2898 |
28. | Canada 3 1950 .31369335 . |
29. | Canada 3 1951 .44475408 3216.2126 |
30. | Canada 3 1952 .8668553 1833.1472 |
|------------------------------------------------------|
31. | Sri Lanka 4 1943 .86803337 3617.7311 |
32. | Sri Lanka 4 1944 .40923508 4508.0392 |
33. | Sri Lanka 4 1945 .7448494 4093.6019 |
34. | Sri Lanka 4 1946 .79308608 . |
35. | Sri Lanka 4 1947 .72141991 4233.8767 |
|------------------------------------------------------|
36. | Sri Lanka 4 1948 .6399412 4404.9189 |
37. | Sri Lanka 4 1949 .6140176 3711.7107 |
38. | Sri Lanka 4 1950 .84002398 3311.1258 |
39. | Sri Lanka 4 1951 .74770728 4021.116 |
40. | Sri Lanka 4 1952 .89887266 4581.6338 |
+------------------------------------------------------+
Now we will identify missing sales, take the cumulative sum, and flag in 1952.
gen year_has_sales = !missing(Sales)
bysort country (year): gen years_with_sales = sum(year_has_sales)
by country: gen dummy = (year == 1952) & years_with_sales == _n
tabdisp country if dummy == 1, c(year)
-----------------------
country | year
-----------+-----------
Austria | 1952
Bangladesh | 1952
-----------------------
As expected we see Austria and Bangladesh with dummy == 1
. Note that I have assumed you have a balanced panel; the code above could be adapted to leverage the min
and max
year within a country. If you wanted to check a narrower window, say 5 years, you could do something like
gen years_with_sales_5 = years_with_sales - L5.years_with_sales
and check whether it is equal to 5.