I'm struggling with something I thought would be fairly trivial. I have a spreadsheet that provides data in the format below, unfortunately this can't be changed, this is the only way it can be provided:
I load the file in pandas in a jupyter notebook, I can read it, specifying the header has 3 rows, so far so good. The point is that because some of the headers in the second level repeat themselves (teachers, students, other), I want to combine the 3 levels into one, so I can easily identify which columns does what. The data in the top left corner changes every day, hence I renamed that one column with nothing (''). The output I'm looking for should have the following columns: country, region, teachers_present, ..., perf_teachers_score, ..., count_teachers etc.
For some reason, pandas renders this table like this:
It doesn't add any Unnamed column name placeholders on level 0, but it does that on level 1 and 2. If I concatenate the names, I get some very ugly column names. I need to concatenate them but ignore the Unnamed ones in the process. My code is:
df = pd.read_excel(src, header=[0,1,2])
# to get rid of the date, works as intended
df.columns.set_levels(['', 'perf', 'count'], level=0, inplace=True)
# doesn't work, tells me str has no str method, despite successfully using this function elsewhere
df.columns.set_levels(['' if x.str.contains('unnamed', case=False, na=False) else x for x in df.columns.levels[1].values], level=1, inplace=True)
In conclusion, what am I doing wrong and how do I get my column names concatenated without the Unnamed (and unwanted) labels?
Thank you!