How to fix reading a CSV where column MultiIndex header row(s) have missing values?

Question

I try to read in several csv files with a unfortunate structure, here's a simplified example:

[empty], A, A, B, B
time   , X, Y, X, Y
0.0    , 0, 0, 0, 0
1.0    , 2, 5, 7, 0
...    , ., ., ., .

...using pandas.read_csv with the header=[0,1] argument I can access the values fine:

>>> df = pd.read_csv('file.csv', header=[0,1]'
>>> df.A.X
0 0
1 2
...

But the empty field above the time header results in an ugly Unnamed: 0_level_0 level:

>>> df.columns
MultiIndex(levels=[['Unnamed: 0_level_0', 'A', 'B'], ...

Is there any way to fix this, so I can access the time data with df.Time again?

EDIT:

This is a snippet of the actual data set:

,,Bone,Bone,Bone
,,Skeleton1_Hip,Skeleton1_Hip,Skeleton1_Hip
,,"1","1","1"
,,Rotation,Rotation,Rotation
Frame,Time,X,Y,Z
0,0.000000,0.009332,0.999247,0.021044
1,0.008333,0.009572,0.999217,0.020468
3,0.016667,0.009871,0.999183,0.019797

(see also: https://gist.github.com/fhaust/25ba612f99420d366f0597b15dbf43e7 for a more complete example)

read via:

pd.read_csv(file, skiprows=2, header=[0,1,3,4], index_col=[1])

I don't really care about the Frame column, as it's given implicitly with the row index.

Possible duplicate of [Rename MultiIndex columns in Pandas](https://stackoverflow.com/questions/41221079/rename-multiindex-columns-in-pandas) — Georgy, Oct 18 '18 at 09:01
Not really a duplicate IMHO, their question is more about renaming the columns, mine is more about how to correctly read in the data while preserving the layout of the data. — fho, Oct 18 '18 at 11:04
The title is overly general; strictly this **isn't** an "unbalanced column MultiIndex", **it's only a CSV file where the first 1/2 columns of the two header rows are missing**. Those can easily be fixed or kludged. The general case (which this isn't) is infinitely harder. Fixed the title. — smci, Jul 16 '22 at 19:46

score 1 · Answer 1 · answered Oct 18 '18 at 08:24

1

Add parameter index_col for convert first column to index:

import pandas as pd

temp=u""",A,A,B,B
time,X,Y,X,Y
0.0,0,0,0,0
1.0,2,5,7,0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1], index_col=[0])

print (df)
      A     B   
time  X  Y  X  Y
0.0   0  0  0  0
1.0   2  5  7  0

Or rename column:

df = df.rename(columns={'Unnamed: 0_level_0':'val'})
print (df)
   val  A     B   
  time  X  Y  X  Y
0  0.0  0  0  0  0
1  1.0  2  5  7  0

answered Oct 18 '18 at 08:24

jezrael

822,522
95
1,334
1,252

But this removes the hierarchical part of the index: `df.columns -> Index([('A','X'), ('A','Y'), ('B','X'), ('B','Y')])` – fho Oct 18 '18 at 10:57
@fho - so data are fifferent like in my answer? I get `print (df.index) Float64Index([0.0, 1.0], dtype='float64')` – jezrael Oct 18 '18 at 10:58
apparently, I've added a snippet of my file to the question – fho Oct 18 '18 at 11:09
@fho - hmmm, is possible upload file (gdocs, dropbox, ...)? Because copy from text should changed data and hard to get same output. – jezrael Oct 18 '18 at 11:10

How to fix reading a CSV where column MultiIndex header row(s) have missing values?

1 Answers1