I would like to compute spell
lengths based on equality of the adjacent column in a pandas dataframe. What is the best way to do this?
An example:
import pandas as pd
d1 = pd.DataFrame([['4', '4', '4', '5'], ['23', '23', '24', '24'], ['112', '112', '112', '112']],
index=['c1', 'c2', 'c3'], columns=[1962, 1963, 1964, 1965])
produces a dataframe that looks like
I would like to return a dataframe such as the following below. This output documents the number of spells that occur on each row. In this case c1
has 2 spells the first one occurs in 1962 to 1964 and the second starts and finishes in 1965:
And a dataframe that describes the spell length as shown below. For example c1
has one spell of 3 years and a second spell of 1 year long in duration.
This re-coding is useful in survival analysis.