0

Trying to output only certain columns of a .txt file to .csv

PANDAS documentation and this answer got me this far:

import pandas as pd

read_file = pd.read_csv (r'death.txt')

header = ['County', 'Crude Rate']

read_file.to_csv (r'death.csv', columns=header, index=None)

But I receive an error:

KeyError: "None of [Index(['County', 'Crude Rate'], dtype='object')] are in the [columns]"

This is confusing as the .txt file I'm using is the following for hundreds of rows (from a government database):

"Notes" "County"    "County Code"   Deaths  Population  Crude Rate
    "Autauga County, AL"    "01001" 7893    918492  859.3
    "Baldwin County, AL"    "01003" 30292   3102984 976.2
    "Barbour County, AL"    "01005" 5197    499262  1040.9

I notice the first three columns have titles enclosed in quotes, and the last three do not. I have experimented with including quotes in my columns sequence (e.g. ""County"") but no luck. Based upon the error, I realize there is some discrepancy between column titles as I have typed them and how they are read in this script.

Any help in understanding this discrepancy is appreciated.

  • 2
    Try `print(df.columns)` to see the actual column names as pandas reads them – RichieV Aug 19 '20 at 02:42
  • you can try getting the index values also: `df = read_file; df.index`. My guess is there is an import issue and those are not column names as expected. – anon01 Aug 19 '20 at 02:57
  • Printing column names as pandas reads them: Index(['Notes\t"County"\t"County Code"\tDeaths\tPopulation\tCrude Rate'], dtype='object') – Elroy Berdahl Aug 19 '20 at 14:52

1 Answers1

0

You are reading the file with default options

read_file = pd.read_csv (r'death.txt')

Change it to

read_file = pd.read_csv (r'death.txt', sep="\t")

Check this

df.columns
Index(['Notes', 'County', 'County Code', 'Deaths', 'Population', 'Crude Rate'], dtype='object')

and the....
You should filter your columns first, an then save.
Now, if your columns are ok:

read_file[['County', 'Crude Rate']].to_csv (r'death.csv', index=None)
Ricardo Rendich
  • 666
  • 5
  • 6