0

Updated: This does not seem to work using wide_to_long as the replies and subsequent question close imply. I may be wrong, but I can't seem to get wide_to_long to work as the dataframe is missing an "ID" column (required). The potential ID is in the second half of the headers (called level below). I don't know how to make an ID column with OP1, OP2, etc.


Original:

I am having trouble with the following:

I have a dataframe (df_mini2) with the shape/entries below. I would like to achieve 3 things.

  1. Split the header by the colon (:) into two parts WOPR and OP#

  2. Pivot or Stack the df to have OP# in a column grouped by Date (index)

  3. All entries into single column for each vector WOPR or WOPT in this case, example desired result at bottom.

    WOPR:OP1    WOPR:OP2    WOPR:OP3    WOPT:OP1    WOPT:OP2    WOPT:OP3
    

    DATE
    2020-01-01 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2020-02-01 513.437988 564.622498 492.789948 15767.968579 17581.923114 15551.30445 2020-03-01 500.612915 552.443481 490.402008 30323.487376 33638.626633 29779.99173

Here is where I have gotten to with my code, but I am too new at this and haven't figured it out successfully.

#first I tried to split the header for use later
#vector is WOPR or WOPT in this small version of df, level is OP#

vector = np.unique(df_mini2.columns.str.split(':').str[0])
level = df_mini2.columns.str.split(':').str[1]
print(vector, level)

print output: ['WOPR' 'WOPT'] Index(['OP1', 'OP2', 'OP3', 'OP1', 'OP2', 'OP3'], dtype='object')

My plan was to try to use the vector/level lists, but I tried many ways to no success. I also want to stack the data differently to look like the example all the way at the bottom, with WOPR and WOPT as separate columns.

I tried

s1 = df_mini2.stack(level=-1)
pd.DataFrame(s1)
s1.head(20)type here

output...not quite right:

DATE                
2020-01-01  WOPR:OP1        0.000000
            WOPR:OP2        0.000000
            WOPR:OP3        0.000000
            WOPT:OP1        0.000000
            WOPT:OP2        0.000000
            WOPT:OP3        0.000000
2020-02-01  WOPR:OP1      513.437988
            WOPR:OP2      564.622498
            WOPR:OP3      492.789948
            WOPT:OP1    15767.968579
            WOPT:OP2    17581.923114
            WOPT:OP3    15551.304457
2020-03-01  WOPR:OP1      500.612915
            WOPR:OP2      552.443481
            WOPR:OP3      490.402008
            WOPT:OP1    30323.487376
            WOPT:OP2    33638.626633
            WOPT:OP3    29779.991737

DESIRED RESULT WOULD BE

DATE        Level       WOPR    WOPT                
2020-01-01      OP1        0.000000 0.000000
2020-01-01      OP2        0.000000 0.000000
2020-01-01      OP3        0.000000 0.000000
2020-02-01      OP1      513.437988 15767.968579
2020-02-01      OP2      564.622498 17581.923114
2020-02-01      OP3      492.789948 15551.304457
2020-03-01      OP1      500.612915 30323.487376
2020-03-01      OP2      552.443481 33638.626633
2020-03-01      OP3      490.402008 29779.991737

Alternatively this could be grouped by Level and date (all OP1 entries, then all OP2 entries, etc). This could work for me too.

Timothy G.
  • 6,335
  • 7
  • 30
  • 46
Geewhiz
  • 3
  • 2

0 Answers0