Drop columns with no header on csv import with pandas

Question

Here is a sample csv:

|  Header A |      | Unnamed: 1 |  Header D |
|-----------|------|------------|-----------|
| a1        | b1   | c1         | d1        |
| a2        | b2   | c2         | d2        |

If I import it with pandas.read_csv, it turns into this:

  Header A Unnamed: 1 Unnamed: 1.1 Header D
0      a1         b1           c1       d1
1      a2         b2           c2       d2

My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed, like the third column in the example.

Columns are not known before hand, so I do not have any control over them.

I have tried the following args with read_csv, but have not had any luck with them:

prefix: it just does not work!
usecols: Empty headers already have a name when they are passed to usecols, which makes it unusable to me.

I have looked at some other answers on SO, like the ones below, but none of them cover my case:

How to get rid of `Unnamed:` column in a pandas dataframe

Remove Unnamed columns in pandas dataframe

You have a column in the CSV file with a name `Unnamed: 1` that you want to keep? Are you writing this CSV file beforehand? — roganjosh, Mar 21 '19 at 22:01
The might be a column in the csv starting with `Unnamed`m but I do not know before hand. I would like to cover all possible cases. — kaveh, Mar 21 '19 at 22:03
@Wen-Ben I didn't. It's an example that shows it's possible to have such column names! — kaveh, Mar 21 '19 at 22:23

score 2 · Accepted Answer · answered Mar 21 '19 at 22:08

2

The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.

import csv

import pandas as pd

with open('test.csv') as infile:
    reader = csv.reader(infile)
    headers = next(reader)

header_indices = [i for i, item in enumerate(headers) if item]

df = pd.read_csv('test.csv', usecols=header_indices)

answered Mar 21 '19 at 22:08

roganjosh

12,594
4
29
46

1

Thanks. That was short and efficient. – kaveh Mar 21 '19 at 22:26
@Bazingaa thanks mate, took a while to get there :) – roganjosh Mar 23 '19 at 10:43

score 0 · Answer 2 · answered Mar 21 '19 at 22:09

0

Read your columns to list with df.columns
create a tf_list with True/False based on your logic (search for None, Unnamed etc)
filter_df = df.loc[:, tf_list]

answered Mar 21 '19 at 22:09

Lior Cohen

5,570
2
14
30

This wouldn't work. The damage would already have been done because Pandas would have filled the blank columns with "Unnamed: x" and it would be indistinguishable from a column that also shared that style of name in the file – roganjosh Mar 21 '19 at 22:18

Drop columns with no header on csv import with pandas

2 Answers2