I have an excel file with over 100 sheets. I need to import into pandas and create a dataframe. The problem is some columns headings contain spaces so I get a dataframe with some duplicate columns.
Is it possible to strip a space in columns in OrderedDict? I know I can use str.strip()
in the dataframe but can't find anything for OrderedDict.
I have the following structure in excel:
sheet TEST:
'ID1' 'ID2' 'ID3' 'ID4'
1 A 2016 val val
2 B 2017 val val
3 C 2018 val val
sheet TEST2:
'ID1' 'ID2' 'ID3 ' 'ID4'
1 A 2016 val val
2 B 2017 val val
3 C 2018 val val
sheet TEST3:
'ID1' 'ID2' 'ID3' 'ID4 '
1 A 2016 val val
2 B 2017 val val
3 C 2018 val val
At the moment I do the following:
df = pd.read_excel (File location, sheet_name = ['TEST', 'TEST2', 'TEST3'])
df = pd.concat(df, axis=0, sort=False)
df = df.reset_index()
This is what I get:
'ID1' 'ID2' 'ID3' 'ID4' 'ID3 ' 'ID4 '
1 A 2016 val val NaN NaN
2 B 2017 val val NaN NaN
3 C 2018 val val NaN NaN
4 A 2016 NaN val val NaN
5 B 2017 NaN val val NaN
6 C 2018 NaN val val NaN
7 A 2016 val NaN NaN val
8 B 2017 val NaN NaN val
9 C 2018 val NaN NaN val
This is what I need:
'ID1' 'ID2' 'ID3' 'ID4'
1 A 2016 val val
2 B 2017 val val
3 C 2018 val val
4 A 2016 val val
5 B 2017 val val
6 C 2018 val val
7 A 2016 val val
8 B 2017 val val
9 C 2018 val val
Many thanks