4

Hello I have csv (tab separated) with header and I would like to load it into Pandas DataFrame in Python 3.5. Problem is, that some headers missing (there are columns with data but without header - as in example below). I tried to load it by read_csv function but if I dont skip first line, it will merge first data lines to match number of headers and number of data columns.

Is there any way, how to read this csv and automatically add missing headers? Like Header4 and Header 5 in example below?

Thank you

| Header1 | Header2 | Header3 |        |        |
|---------|---------|---------|--------|--------|
| value1  | value2  | value3  | value4 | value5 |
| value1  | value2  | value3  | value4 | value5 |
stanedav
  • 447
  • 1
  • 5
  • 12
  • Possible duplicate of [read\_csv with missing/incomplete header or irregular number of columns](https://stackoverflow.com/questions/34358196/read-csv-with-missing-incomplete-header-or-irregular-number-of-columns) – Shaido Oct 03 '17 at 08:09

2 Answers2

4

You can rename columns in post processing by dict:

print (df)
  Header1 Header2 Header3 Unnamed: 3 Unnamed: 4
0  value1  value2  value3     value4     value5
1  value1  value2  value3     value4     value5


df = df.rename(columns = {'Unnamed: 3':'Header4','Unnamed: 4':'Header5'})
print (df)
  Header1 Header2 Header3 Header4 Header5
0  value1  value2  value3  value4  value5
1  value1  value2  value3  value4  value5

Or by custom function:

f = lambda x: 'Header{}'.format(int(x.split()[1])+1) if 'Unnamed' in x else x
df = df.rename(columns = f)
print (df)
  Header1 Header2 Header3 Header4 Header5
0  value1  value2  value3  value4  value5
1  value1  value2  value3  value4  value5

Or use parameters names and header or names and skiprows:

df = pd.read_csv('file', names=['Header1','Header2','Header3','Header4','Header5'], header=0)
print (df)
  Header1 Header2 Header3 Header4 Header5
0  value1  value2  value3  value4  value5
1  value1  value2  value3  value4  value5

Or:

df = pd.read_csv('file',
                 names=['Header1','Header2','Header3','Header4','Header5'],
                 skiprows=1)
print (df)
  Header1 Header2 Header3 Header4 Header5
0  value1  value2  value3  value4  value5
1  value1  value2  value3  value4  value5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can achieve that as follows:

df.columns = df.columns[:3].tolist()  + ['Header4', 'Header5']

output:

    Header1    Header2    Header3    Header4   Header5
0   value1     value2     value3     value4    value5 
1   value1     value2     value3     value4    value5 
Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117