I am currently working on my master thesis and I have a problem with regards to data organization in pandas. I downloaded multiple economic indicators that are published once a month and consolidated them in one dataframe.
However, the economic indicators are released on different days each month. Therefore my dataframe has for example five different rows for January 2020 (e.g. January 1st, January 5th, January 13th, January 28th, January 31st) and many „NaN“ values in each row.
I want to organize my data so that I have one row for each month, so for example one row for January 2020. However, I cannot figure out how to solve this problem in pandas.
Another challenge represents the fact that sometimes the data is released on March 1st and on March 31st. Therefore consolidating everything in one month could also lead to new problems if the values are summed up.
The table below visualizes my problem. My index column in the dataframe are the dates.
| Dates | Indicator 1 | Indicator 2
| ———————— | ——————————— | ———————————
| 01.01.20 | 1 | 1
| 08.01.20 | 2 | NaN
| 02.02.20 | 5 | 5
| 01.03.20 | 8 | 6
| 31.03.20 | 7 | 7
I already tried pd.to_period or pd.groupby, but I could not solve the problem.