0

I'm importing into a dataframe an excel sheet which has its headers split into two rows:

Colour | NaN   | Shape | Mass | NaN
NaN    | width | NaN   | NaN  | Torque

green  | 33    | round | 2    | 6
etc

I want to collapse the first two rows into one header:

Colour | width | Shape | Mass | Torque

green  | 33    | round | 2    | 6
...

I tried merged_header = df.loc[0].combine_first(df.loc[1]) but I'm not sure how to get that back into the original dataframe.

I've tried:

# drop top 2 rows
df = df.drop(df.index[[0,1]])
# then add the merged one in:
res = pd.concat([merged_header, df], axis=0)

But that just inserts merged_header as a column. I tried some other combinations of merge from this tutorial but without luck.

merged_header.append(df) gives a similar wrong result, and res = df.append(merged_header) is almost right, but the header is at the tail end:

green  | 33    | round | 2    | 6
...
Colour | width | Shape | Mass | Torque

To provide more detail this is what I have so far:

df = pd.read_excel(ltro19, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

in case if affects the next step.

Massagran
  • 1,781
  • 1
  • 20
  • 29
  • Is it just 5 columns you have? – Jon Clements Mar 27 '20 at 03:02
  • ('cos a quick although not generic way to do it would be `df = pd.read_excel('filename', skiprows=2, header=None, names=['colour', 'width', 'shape', 'mass', 'torque'])`... – Jon Clements Mar 27 '20 at 03:04
  • @JonClements, no. I have 20 columns with very long titles, and many excel sheets (equally formatted but with other headers). It was just a simplified example. – Massagran Mar 27 '20 at 13:03

3 Answers3

1

Let's use list comprehension to flatten multiindex column header:

df.columns = [f'{j}' if str(i)=='nan' else f'{i}' for i, j in df.columns]

Output:

['Colour', 'width', 'Shape', 'Mass', 'Torque']
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • `df.columns` is `Int64Index([0, 2, 3, 5, 7, 8, 10, 13, 14, 16, 17, 20, 21, 23, 24, 25], dtype='int64')`, so iterating over it leads to `TypeError: 'int' object is not iterable` – Massagran Mar 27 '20 at 14:19
  • 1
    Then you do don't have a multiIndex colum headers. You just need to move the first two rows of your dataframe into column header. – Scott Boston Mar 27 '20 at 14:24
  • You need to create code that will produce your input dataframe. – Scott Boston Mar 27 '20 at 14:25
  • Sorry, I'm afraid I don't have the right vocabulary yet. I don't know what a multiIndex dataframe is. All I have is an excel sheet with very strange formatting and I'm trying to turn it into a more reasonable table. Where can I learn about "moving rows into column header"? Thank you – Massagran Mar 27 '20 at 19:23
0

This should work for you:

df.columns = list(df.columns.get_level_values(0))
Dharman
  • 30,962
  • 25
  • 85
  • 135
Chadee Fouad
  • 2,630
  • 2
  • 23
  • 29
  • `df.columns.get_level_values(0)` returns `Int64Index([0, 2, 3, 5, 7, 8, 10, 13, 14, 16, 17, 20, 21, 23, 24, 25], dtype='int64')` but calling list on it gives the error `TypeError: 'DataFrame' object is not callable` – Massagran Mar 27 '20 at 13:07
0

Probably due to my ignorance of the terms, the suggestions above did not lead me directly to a working solution. It seemed I was working with a dataframe

>>> print(type(df))
>>> <class 'pandas.core.frame.DataFrame'>

but, I think, without headers.

This solution worked, although it involved jumping out of the dataframe and into a list to then put it back as the column headers. Inspired by Merging Two Rows (one with a value, the other NaN) in Pandas

df = pd.read_excel(name_of_file, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

# merge the two headers which are weirdly split over two rows
merged_header = df.loc[0].combine_first(df.loc[1])
# turn that into a list
header_list = merged_header.values.tolist()
# load that list as the new headers for the dataframe
df.columns = header_list
# drop top 2 rows (old split header)
df = df.drop(df.index[[0,1]])
Massagran
  • 1,781
  • 1
  • 20
  • 29