0

Sample Data:

id start_to_end_date attribute val
1395287 2019-01-01 - 2019-12-31 CostOfGoodsAndServicesSold 49.369
1395288 2020-01-01 - 2020-12-31 CostOfGoodsAndServicesSold 75.476
1395289 2020-04-01 - 2020-06-30 CostOfGoodsAndServicesSold 17.69
1395290 2020-07-01 - 2020-09-30 CostOfGoodsAndServicesSold 19.056
1395291 2021-01-01 - 2021-12-31 CostOfGoodsAndServicesSold 137.292

If some of you are familiar with financial statements and Index Match in Excel, I would basically like to recreate what an Income Statement looks like.

I have all my unique column_names in a list. This current order is important:

column_names = ['2020-04-01 - 2020-06-30',
 '2020-07-01 - 2020-09-30',
 '2021-01-01 - 2021-03-31',
 '2021-04-01 - 2021-06-30',
 '2021-07-01 - 2021-09-30',
 '2022-01-01 - 2022-03-31',
 '2019-01-01 - 2019-12-31',
 '2020-01-01 - 2020-12-31',
 '2021-01-01 - 2021-12-31']

I also have all my unique row names in a list. This current order is important:

row_names = ['RevenueFromContractWithCustomerExcludingAssessedTax',
 'CostOfGoodsAndServicesSold',
 'GrossProfit',
 'ResearchAndDevelopmentExpense']

The final dataframe would have the above column_names going across, the above row_names going down (as the index). The values that would fill in dataframe would be from df['val'], retrieved by matching df['attribute'] == column_names[i] and df['start_to_end_date'] == row_names[i].

How would I do this?

  • I tried to create an empty Dataframe first with the 2 lists but there is no place to query for df['val']:

pd.DataFrame(row_names , columns=column_names, index=[i[0] for i in row_names ])

  • I'm imagining something in the end would look like:

df_new = np.where((df['attribute'] == column_names) & (df['start_to_end_date'] == row_names), df['val'], None).

Ideal End result:

2020-04-01 - 2020-06-30 2020-07-01 - 2020-09-30 2021-01-01 - 2021-03-31 2021-04-01 - 2021-06-30
RevenueFromContractWithCustomerExcludingAssessedTax ##.## from df[val] ##.## ##.## ##.##
CostOfGoodsAndServicesSold 17.69 19.056 ##.## ##.##
GrossProfit ##.## from df[val] ##.## ##.## ##.##
Katsu
  • 8,479
  • 3
  • 15
  • 16
  • 1
    `pivot`, then reindex with `row_names` to ensure having the desired order – mozway Jun 15 '22 at 04:57
  • Thank you. Just in case others had the same question here is the answer: # Source: https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe # Source: https://stackoverflow.com/questions/43845059/how-to-reindex-a-pandas-pivot-table index_match_df = df_new.pivot_table( values='val', index='attribute', columns='start_to_end_date', fill_value=0) index_match_df = index_match_df.reindex(y_axis_titles) index_match_df – Katsu Jun 16 '22 at 00:24

0 Answers0