Updated: This does not seem to work using wide_to_long
as the replies and subsequent question close imply. I may be wrong, but I can't seem to get wide_to_long
to work as the dataframe is missing an "ID" column (required). The potential ID is in the second half of the headers (called level below). I don't know how to make an ID column with OP1, OP2, etc.
Original:
I am having trouble with the following:
I have a dataframe (df_mini2) with the shape/entries below. I would like to achieve 3 things.
Split the header by the colon (:) into two parts WOPR and OP#
Pivot or Stack the df to have OP# in a column grouped by Date (index)
All entries into single column for each vector WOPR or WOPT in this case, example desired result at bottom.
WOPR:OP1 WOPR:OP2 WOPR:OP3 WOPT:OP1 WOPT:OP2 WOPT:OP3
DATE
2020-01-01 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2020-02-01 513.437988 564.622498 492.789948 15767.968579 17581.923114 15551.30445 2020-03-01 500.612915 552.443481 490.402008 30323.487376 33638.626633 29779.99173
Here is where I have gotten to with my code, but I am too new at this and haven't figured it out successfully.
#first I tried to split the header for use later
#vector is WOPR or WOPT in this small version of df, level is OP#
vector = np.unique(df_mini2.columns.str.split(':').str[0])
level = df_mini2.columns.str.split(':').str[1]
print(vector, level)
print output: ['WOPR' 'WOPT'] Index(['OP1', 'OP2', 'OP3', 'OP1', 'OP2', 'OP3'], dtype='object')
My plan was to try to use the vector/level lists, but I tried many ways to no success. I also want to stack the data differently to look like the example all the way at the bottom, with WOPR and WOPT as separate columns.
I tried
s1 = df_mini2.stack(level=-1)
pd.DataFrame(s1)
s1.head(20)type here
output...not quite right:
DATE
2020-01-01 WOPR:OP1 0.000000
WOPR:OP2 0.000000
WOPR:OP3 0.000000
WOPT:OP1 0.000000
WOPT:OP2 0.000000
WOPT:OP3 0.000000
2020-02-01 WOPR:OP1 513.437988
WOPR:OP2 564.622498
WOPR:OP3 492.789948
WOPT:OP1 15767.968579
WOPT:OP2 17581.923114
WOPT:OP3 15551.304457
2020-03-01 WOPR:OP1 500.612915
WOPR:OP2 552.443481
WOPR:OP3 490.402008
WOPT:OP1 30323.487376
WOPT:OP2 33638.626633
WOPT:OP3 29779.991737
DESIRED RESULT WOULD BE
DATE Level WOPR WOPT
2020-01-01 OP1 0.000000 0.000000
2020-01-01 OP2 0.000000 0.000000
2020-01-01 OP3 0.000000 0.000000
2020-02-01 OP1 513.437988 15767.968579
2020-02-01 OP2 564.622498 17581.923114
2020-02-01 OP3 492.789948 15551.304457
2020-03-01 OP1 500.612915 30323.487376
2020-03-01 OP2 552.443481 33638.626633
2020-03-01 OP3 490.402008 29779.991737
Alternatively this could be grouped by Level and date (all OP1 entries, then all OP2 entries, etc). This could work for me too.