0

Here is a sample csv:

|  Header A |      | Unnamed: 1 |  Header D |
|-----------|------|------------|-----------|
| a1        | b1   | c1         | d1        |
| a2        | b2   | c2         | d2        |

If I import it with pandas.read_csv, it turns into this:

  Header A Unnamed: 1 Unnamed: 1.1 Header D
0      a1         b1           c1       d1
1      a2         b2           c2       d2

My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed, like the third column in the example.

Columns are not known before hand, so I do not have any control over them.

I have tried the following args with read_csv, but have not had any luck with them:

  • prefix: it just does not work!
  • usecols: Empty headers already have a name when they are passed to usecols, which makes it unusable to me.

I have looked at some other answers on SO, like the ones below, but none of them cover my case:

How to get rid of `Unnamed:` column in a pandas dataframe

Remove Unnamed columns in pandas dataframe

kaveh
  • 2,046
  • 1
  • 21
  • 33
  • You have a column in the CSV file with a name `Unnamed: 1` that you want to keep? Are you writing this CSV file beforehand? – roganjosh Mar 21 '19 at 22:01
  • The might be a column in the csv starting with `Unnamed`m but I do not know before hand. I would like to cover all possible cases. – kaveh Mar 21 '19 at 22:03
  • why you named your csv columns like this ...`Unnamed: 1` – BENY Mar 21 '19 at 22:07
  • @Wen-Ben I didn't. It's an example that shows it's possible to have such column names! – kaveh Mar 21 '19 at 22:23

2 Answers2

2

The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.

import csv

import pandas as pd

with open('test.csv') as infile:
    reader = csv.reader(infile)
    headers = next(reader)

header_indices = [i for i, item in enumerate(headers) if item]

df = pd.read_csv('test.csv', usecols=header_indices)
roganjosh
  • 12,594
  • 4
  • 29
  • 46
0
  1. Read your columns to list with df.columns
  2. create a tf_list with True/False based on your logic (search for None, Unnamed etc)
  3. filter_df = df.loc[:, tf_list]
Lior Cohen
  • 5,570
  • 2
  • 14
  • 30
  • This wouldn't work. The damage would already have been done because Pandas would have filled the blank columns with "Unnamed: x" and it would be indistinguishable from a column that also shared that style of name in the file – roganjosh Mar 21 '19 at 22:18