1

I'm struggling with something I thought would be fairly trivial. I have a spreadsheet that provides data in the format below, unfortunately this can't be changed, this is the only way it can be provided:

Spreadsheet format

I load the file in pandas in a jupyter notebook, I can read it, specifying the header has 3 rows, so far so good. The point is that because some of the headers in the second level repeat themselves (teachers, students, other), I want to combine the 3 levels into one, so I can easily identify which columns does what. The data in the top left corner changes every day, hence I renamed that one column with nothing (''). The output I'm looking for should have the following columns: country, region, teachers_present, ..., perf_teachers_score, ..., count_teachers etc.

For some reason, pandas renders this table like this:

data rendered by pandas

It doesn't add any Unnamed column name placeholders on level 0, but it does that on level 1 and 2. If I concatenate the names, I get some very ugly column names. I need to concatenate them but ignore the Unnamed ones in the process. My code is:

df = pd.read_excel(src, header=[0,1,2])

# to get rid of the date, works as intended
df.columns.set_levels(['', 'perf', 'count'], level=0, inplace=True)
# doesn't work, tells me str has no str method, despite successfully using this function elsewhere
df.columns.set_levels(['' if x.str.contains('unnamed', case=False, na=False) else x for x in df.columns.levels[1].values], level=1, inplace=True)

In conclusion, what am I doing wrong and how do I get my column names concatenated without the Unnamed (and unwanted) labels?

Thank you!

Chris
  • 303
  • 4
  • 16
  • 1
    In the for loop inside `set_levels` you are iterating only on level 1, so you should do the same on level 0 to get rid of the unnamed index there. – DavideBrex May 21 '20 at 11:56
  • I'm a bit confused. level 0 doesn't have any column that contain Unnamed. it's the column names I'm looking to change (from Unnamed... to '') so that when I concatenate the names, I get predictable and usable results. Or is my understanding of columns flawed? – Chris May 21 '20 at 12:00
  • 1
    My bad, i switched the levels' order (so with 0 i meant level 2) – DavideBrex May 21 '20 at 12:07
  • but shouldn't my code just replace the column names on level 1 only? that's what I'm aiming for, I will then add another line to do the same on level 2. if my aim was to replace the column names in level 1 with '', shouldn't that be the correct code? – Chris May 21 '20 at 12:09
  • 1
    Check this question: [Pandas dataframe with multiindex column - merge levels](https://stackoverflow.com/questions/24290297/pandas-dataframe-with-multiindex-column-merge-levels) – DavideBrex May 21 '20 at 12:13
  • alright, I'm starting to get the hang of things, but I'm stuck writing a rule that combines them based on more than one condition. I can successfully combine level 0 and level 1 with this, but how do I add another rule? df.columns = [f'{x}{y}{z}' if not 'unnamed' in z.lower() else '' for x, y, z in df.columns] I guess my question would be how do I write an entire if elif else inside that line? – Chris May 21 '20 at 13:27

1 Answers1

1

Got it...

df.columns = [f'{x}{z}' if 'unnamed' in y.lower() else f'{x}{y}' if 'unnamed' in z.lower() else f'{x}{y}{z}' for x, y, z in df.columns]

Thank you David, you've been helpful!

Chris
  • 303
  • 4
  • 16