0

I have data regarding the years of birth and death of several people. I want to compute efficiently how many people are in each of a group of pre-defined epochs.

For example. If I have this list of data:

  • Paul 1920-1950
  • Sara 1930-1950
  • Mark 1960-2020
  • Lennard 1960-1970

and I define the epochs 1900-1980 and 1980-2023, I would want to compute the number of people alive in each period (not necessarily the whole range of the years). In this case, the result would be 4 people (Paul, Sara, Mark and Lennard) for the first epoch and 1 person (Mark) for the second epoch.

Is there any efficient routine out there? I would like to know, as the only way I can think of now is to create a huge loop with a lot of ifs to start categorizing.

I really appreciate any help you can provide.

J_H
  • 17,926
  • 4
  • 24
  • 44
Pablo
  • 87
  • 5

2 Answers2

1

So, you need to check that they have been born before the end of the period AND they have died after the start of the period.

One way could be:

# add columns for birth year and death year
df[['birth', 'death']] = df['birt/death'].str.split('-', expand=True)


# convert to numeric (https://stackoverflow.com/a/43266945/15032126)
cols = ['birth', 'death']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
index name birt/death birth death
0 Paul 1920-1950 1920 1950
1 Sara 1920-1950 1920 1950
2 Mark 1960-2020 1960 2020
3 Lennard 1960-1970 1960 1970
def counts_per_epoch(df, start, end):
    return len(df[(df['birth'] <= end) & (df['death'] >= start)])

print(counts_per_epoch(df, 1900, 1980))
print(counts_per_epoch(df, 1980, 2023))
# Prints
4
1
Ignatius Reilly
  • 1,594
  • 2
  • 6
  • 15
0

Loop over all individuals. Expand "birth .. death" years into epochs. If epoch granularity was 12 months, then you would generate 30 rows for a 30-year old, and so on. Your granularity is much coarser, with valid epoch labels being just {1900, 1980}, so each individual will have just one or two rows. One of your examples would have a "1900, Mark" row, and a "1980, Mark" row, indicating he was alive for some portion of both epochs.

Now just sort values and group by, to count how many 1900 rows and how many 1980 rows there are. Report the per-epoch counts. Or report names of folks alive in each epoch, if that's the level of detail you need.

J_H
  • 17,926
  • 4
  • 24
  • 44