I'm looking to do vintage analysis in pandas with some csv data, and I'm wondering if I can streamline the process vs. iterating through a dataframe to create the vintage analysis. For example, I have a dataset similar to below in csv and I've read it into a dataframe default
.
Origination | Default Month | Default Amount |
---|---|---|
1Q20 | 6 | 3000 |
1Q20 | 8 | 2000 |
1Q20 | 4 | 1000 |
2Q20 | 6 | 3000 |
2Q20 | 8 | 2500 |
1Q20 | 6 | 3000 |
2Q20 | 0 | 0 |
3Q20 | 3 | 1000 |
2Q20 | 4 | 3000 |
3Q20 | 6 | 4000 |
Now I know I can easily hit the dataframe default.groupby(['Origination']).sum()
which tells me by quarter what defaulted, but doesn't give me the time dimension in months. Similarly grouping by the columns will give me defaults by month but not differtiate by quarter.
I'm guessing using something like this this question and grouping by two dimensions default.groupby(['Origination','Default Month']).sum()
would allow me to split it into quarters and by months, BUT is there a way to effectively do that in a vintage format?
I created a blank dataframe with the form:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|
1Q20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2Q20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3Q20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I'm trying to transform that dataset by summation into a vintage table like so, note for example 1Q20 had two loans default in the 6 month so that is now $6,000.
Default | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
1Q20 | 0 | 0 | 0 | 1000 | 0 | 6000 | 0 | 2000 |
2Q20 | 0 | 3000 | 0 | 0 | 0 | 3000 | 0 | 2500 |
3Q20 | 0 | 0 | 1000 | 0 | 0 | 4000 | 0 | 0 |
Is there an easy way to do this I'm overlooking, or is it a dual group.by and then trying to assign the data by category into the dataframe?