0

I wanted to transpose data in one sheet as below using python:: AS IS

enter image description here

To be

enter image description here

when i try to use the below code , it is giving me the output as below

import numpy as np
import pandas as pd
import openpyxl
dfs = pd.read_excel('test.xlsx', sheet_name=None,header=None)
tester=dfs['data'].values.tolist()
keys = list(zip(*tester))[0]
seen = set()
seen_add = seen.add
keysu= [x for x in keys if not (x in seen or seen_add(x))]
values = list(zip(*tester))[1]
a = np.array(values).reshape(int(len(values)/len(keysu)),len(keysu))
list1=[keysu]
for i in a:
 list1.append(list(i))
 df=pd.DataFrame(list1)
 df.to_excel('test1.xlsx',index=False,header=False)

enter image description here

1 Answers1

1

Actually, you need to do pivot, not transpose.

Try: df.pivot(index='Hierarchy', columns='Fiscal_Mo', values='Amount').fillna('').

The final fillna('') is required to replace NaN values for non-existing combinations of Fiscal_Mo and Hierarchy, e.g. 201802 / A.

Full code:

import pandas as pd

df = pd.DataFrame( {'Fiscal_Mo': ['201801', '201802', '201803', '201801'],
    'Hierarchy': ['A', 'B', 'C', 'B'], 'Amount': [ 100, 20, 300, 400 ] } )
df.pivot(index='Hierarchy', columns='Fiscal_Mo',values='Amount').fillna('')

A remark concerning your comment: It contains hierarchy written with lower case h. Maybe this is the source of your problem?

Edit

Maybe Hierarchy is the index of your DataFrame.

In this case, the index of pivot table is to bo created just from the index of the source DataFrame, so index parameter should be omitted.

The corresponding code to create the DataFrame is:

import pandas as pd

ind = pd.Index(data=['A', 'B', 'C', 'B'], name='Hierarchy')
df = pd.DataFrame( data={'Fiscal_Mo': ['201801', '201802', '201803', '201801'],
    'Amount': [ 100, 20, 300, 400 ] }, index=ind)

and the command to create the pivot table is:

df.pivot(columns='Fiscal_Mo', values='Amount').fillna('')

Edit 2

Investigation concerning header=None

The first attempt: I executed:

df = pd.read_excel('Report.xlsx', sheet_name='Report', header=None)

and printed the df, the result was:

           0          1       2
0  Fiscal_Mo  Hierarchy  Amount
1     201801          A     100
2     201802          B      20
3     201803          C     300
4     201801          B     400

As you can see:

  1. Column names in the DataFrame are consecutive numbers (wrong).
  2. Column names from the source file (Fiscal_Mo, Hierarchy, Amount) are values of the first row (wrong).

Here, any attempt to call any Pandas function and passing "textual" column names will fail.

The second attempt: I executed:

df = pd.read_excel('Report.xlsx', sheet_name='Report')

Note that this time there is no header parameter, so the function assumes header=0 - read column names from row 0 (initial).

When I printed the df, the result was:

   Fiscal_Mo Hierarchy  Amount
0     201801         A     100
1     201802         B      20
2     201803         C     300
3     201801         B     400

As you can see:

  1. Column names in the DataFrame are taken from row number 0 (OK).
  2. Data rows are taken from "further" rows (OK).

Now you can call e.g.:

df.pivot(index='Hierarchy', columns='Fiscal_Mo', values='Amount').fillna('')

without any errors.

Note that:

  1. Parameter values must exactly match respective column names (lower / upper case letters).
  2. I called df.pivot(...) not pd.pivot(df, ...). Apparently you failed to read carefully my instructions and copy all parameters to your code.

Conclusion: Delete header=None from read_excel call and your program should work.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thanks Valdi for the answer , but i am getting below error while trying to use code return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 164, in pandas._libs.index.IndexEngine.get_loc KeyError: 'hierarchy' – Rajyalakshmi Gangisetty Nov 18 '18 at 11:19
  • i tried with lowercase - "h" as well, i am trying to read excel file and trying to pivot the data in the sheet using -- df = pd.read_excel('Report_test1_ 111818_15_32_27.xlsx', sheet_name="test",header=None) d = pd.pivot_table(df,index=["hierarchy"]) – Rajyalakshmi Gangisetty Nov 18 '18 at 11:40
  • Note that in the example code I have written *index='Hierarchy'* (without square parentheses) and there is also *columns='Fiscal_No'*. – Valdi_Bo Nov 18 '18 at 14:34
  • Apparently the first row contains just column names. Why did you write *header=None*? In such case the first row (column names) is actually the first data row and column names are consecutive numbers. I think, this is not what you want. To check this detail, add *print(df)* to your code. – Valdi_Bo Nov 18 '18 at 14:39
  • Thanks Valdi .. Scenario 2 worked perfectly.. but i have a doubt here on how to do dynamic changing index where as columns and values column remain constant – Rajyalakshmi Gangisetty Nov 19 '18 at 07:23