1

I have a csv file of fish occurrences and need to trim out any fish that show up only once, and then output this as a 'trimmed' csv. However, the function I am using adds a headerless column to the trimmed csv, which messes up further calculations I need to do with the trimmed file.

The column includes row numbers from to_keep and I believe is created as a result of this line: return df[df[colname].isin(to_keep)]. I would like to have this script simply not create this column; otherwise I have no manually delete it from every single csv file I trim!

import pandas as pd

def trim_single_entries(fn, colname):
# remove all entries where colname's entry is unique to one row across the whole file
df = pd.read_csv(fn)
if colname in df.columns:
    counts = df[colname].value_counts()
    to_keep = [counts.index[i] for i in range(0,len(counts)) if counts.values[i] > 1]  
    return df[df[colname].isin(to_keep)]
else:
    return False

x = trim_single_entries('fish_data.csv', 'catalognumber')

x.to_csv('trimmed_fish_data.csv')
spops
  • 572
  • 1
  • 7
  • 25

1 Answers1

3

Add index=False to the to_csv method

Brian Pendleton
  • 839
  • 4
  • 13