Create empty dataframe based on 2 lists (representing the row and column), then fill in based on values from another dataframe column

Question

Sample Data:

id	start_to_end_date	attribute	val
1395287	2019-01-01 - 2019-12-31	CostOfGoodsAndServicesSold	49.369
1395288	2020-01-01 - 2020-12-31	CostOfGoodsAndServicesSold	75.476
1395289	2020-04-01 - 2020-06-30	CostOfGoodsAndServicesSold	17.69
1395290	2020-07-01 - 2020-09-30	CostOfGoodsAndServicesSold	19.056
1395291	2021-01-01 - 2021-12-31	CostOfGoodsAndServicesSold	137.292

If some of you are familiar with financial statements and Index Match in Excel, I would basically like to recreate what an Income Statement looks like.

I have all my unique column_names in a list. This current order is important:

column_names = ['2020-04-01 - 2020-06-30',
 '2020-07-01 - 2020-09-30',
 '2021-01-01 - 2021-03-31',
 '2021-04-01 - 2021-06-30',
 '2021-07-01 - 2021-09-30',
 '2022-01-01 - 2022-03-31',
 '2019-01-01 - 2019-12-31',
 '2020-01-01 - 2020-12-31',
 '2021-01-01 - 2021-12-31']

I also have all my unique row names in a list. This current order is important:

row_names = ['RevenueFromContractWithCustomerExcludingAssessedTax',
 'CostOfGoodsAndServicesSold',
 'GrossProfit',
 'ResearchAndDevelopmentExpense']

The final dataframe would have the above column_names going across, the above row_names going down (as the index). The values that would fill in dataframe would be from df['val'], retrieved by matching df['attribute'] == column_names[i] and df['start_to_end_date'] == row_names[i].

How would I do this?

I tried to create an empty Dataframe first with the 2 lists but there is no place to query for df['val']:

pd.DataFrame(row_names , columns=column_names, index=[i[0] for i in row_names ])

I'm imagining something in the end would look like:

df_new = np.where((df['attribute'] == column_names) & (df['start_to_end_date'] == row_names), df['val'], None).

Ideal End result:

	2020-04-01 - 2020-06-30	2020-07-01 - 2020-09-30	2021-01-01 - 2021-03-31	2021-04-01 - 2021-06-30
RevenueFromContractWithCustomerExcludingAssessedTax	##.## from df[val]	##.##	##.##	##.##
CostOfGoodsAndServicesSold	17.69	19.056	##.##	##.##
GrossProfit	##.## from df[val]	##.##	##.##	##.##

`pivot`, then reindex with `row_names` to ensure having the desired order — mozway, Jun 15 '22 at 04:57
Thank you. Just in case others had the same question here is the answer: # Source: https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe # Source: https://stackoverflow.com/questions/43845059/how-to-reindex-a-pandas-pivot-table index_match_df = df_new.pivot_table( values='val', index='attribute', columns='start_to_end_date', fill_value=0) index_match_df = index_match_df.reindex(y_axis_titles) index_match_df — Katsu, Jun 16 '22 at 00:24

Create empty dataframe based on 2 lists (representing the row and column), then fill in based on values from another dataframe column

0 Answers0