0

I have a data frame that is looking like this (DATA is the year and month of the order) :

CUSTOMER_ID NAME DATA COFFEE_SOLD(KG) WATER_SOLD(L)
10000 ALEX 2022 - 01 3 4
10000 ALEX 2022 - 01 5 6
10000 ALEX 2022 - 02 7 8
10001 JOE 2022 - 02 1 1
10001 JOE 2022 - 03 1 0

I pivoted the df with :

df_rap = df_rap.pivot_table(index=["CUSTOMER_ID",'NAME',],columns=["DATA"], values=['COFFEE_SOLD(KG)','WATER_SOLD(L)'], aggfunc='sum').reset_index()

The result :

CUSTOMER_ID NAME COFFEE_SOLD(KG) COFFEE_SOLD(KG) COFFEE_SOLD(KG) WATER_SOLD(L) WATER_SOLD(L) WATER_SOLD(L)
DATA 2022 - 01 2022 - 02 2022 - 03 2022 - 01 2022 - 02 2022 - 03
0 10000 ALEX 8 7 0 10 8 0
1 10001 JOE 0 1 1 0 1 0

The format is ok but I want to export it to excel. For that I need the data frame to look like this :

COFFEE_SOLD(KG) COFFEE_SOLD(KG) COFFEE_SOLD(KG) WATER_SOLD(L) WATER_SOLD(L) WATER_SOLD(L)
DATA CUSTOMER_ID NAME 2022 - 01 2022 - 02 2022 - 03 2022 - 01 2022 - 02 2022 - 03
0 10000 ALEX 8 7 0 10 8 0
1 10001 JOE 0 1 1 0 1 0

In other words, i would like to lower the level of the first 2 column ( in header ), to save it in excel properly.

I tried :

df.reset_index()

And it dosen't work.

EDIT :

With :

display( df_copy.columns)

I saw the format of the columns :

MultiIndex([('CUSTOMER_ID', ''),
            ('NAME',        ''),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

I expected to be :

MultiIndex([('', 'CUSTOMER_ID'),
            ('',        'NAME'),
            ('COFFEE_SOLD(KG)', '2022 - 01'),
            ('COFFEE_SOLD(KG)', '2022 - 02'),
            ('COFFEE_SOLD(KG)', '2022 - 03'),
            ('WATER_SOLD(L)', '2022 - 01'),
            ('WATER_SOLD(L)', '2022 - 02'),
            ('WATER_SOLD(L)', '2022 - 03'),],
           names=[None, 'DATA'])

Thank you !

  • It's unclear.. what are those `empty sapces?` Are those are `empty values` or `mutli-Index`?... – Bhargav - Retarded Skills Nov 11 '22 at 12:29
  • Hello, I think it's multi-Index because i used pivot function on the data frame. – Radu Iordache Nov 11 '22 at 12:36
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Nov 11 '22 at 14:45

1 Answers1

1

A possible approach is to overwrite the column values:

cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:]:
    cols.append(t)
    
df_rap.columns = pd.MultiIndex.from_tuples(cols)

This leads to a data frame without the word DATA in it. Which somehow makes sense, as DATA has lost some of it's meaning - now being just the name of the index column. If you nevertheless need to keep DATA, you could create a new column with the corresponding values, rename all columns and move the DATA column to the front (and save the data frame without the new index column) :

# create a new DATA column
df_rap['DATA'] = df_rap.index
# set new values for the column headers (this time including the new DATA column)
cols = [('', 'CUSTOMER_ID'), ('', 'NAME'),]
for t in df_rap.columns[2:-1]:
    cols.append(t)
cols.append(('', 'DATA'))
df_rap.columns = pd.MultiIndex.from_tuples(cols)
# reorder columns ('DATA' to the front)
cols = cols[-1:] + cols[:-1]
df_rap = df_rap[cols]
rosa b.
  • 1,759
  • 2
  • 15
  • 18