4

I am trying to short zipcodes into various files but I keep getting

ValueError: cannot reindex from a duplicate axis

I've read through other documentation on Stackoverflow, but I haven't been about to figure out why its duplicating axis.

import csv
import pandas as pd
from pandas import DataFrame as df
fp = '/Users/User/Development/zipcodes/file.csv'
file1 = open(fp, 'rb').read()
df = pd.read_csv(fp, sep=',')

df = df[['VIN', 'Reg Name', 'Reg Address', 'Reg City', 'Reg ST', 'ZIP',
         'ZIP', 'Catagory', 'Phone', 'First Name', 'Last Name', 'Reg NFS',
         'MGVW', 'Make', 'Veh Model','E Mfr', 'Engine Model', 'CY2010',
         'CY2011', 'CY2012', 'CY2013', 'CY2014', 'CY2015', 'Std Cnt', 
        ]]
#reader.head(1)
df.head(1)
zipBlue = [65355, 65350, 65345, 65326, 65335, 64788, 64780, 64777, 64743,
64742, 64739, 64735, 64723, 64722, 64720]

Also contains zipGreen, zipRed, zipYellow, ipLightBlue But did not include in example.

def IsInSort():
    blue = df[df.ZIP.isin(zipBlue)]
    green = df[df.ZIP.isin(zipGreen)]
    red = df[df.ZIP.isin(zipRed)]
    yellow = df[df.ZIP.isin(zipYellow)]
    LightBlue = df[df.ZIP.isin(zipLightBlue)]
def SaveSortedZips():
    blue.to_csv('sortedBlue.csv')
    green.to_csv('sortedGreen.csv')
    red.to_csv('sortedRed.csv')
    yellow.to_csv('sortedYellow.csv')
    LightBlue.to_csv('SortedLightBlue.csv')
IsInSort()
SaveSortedZips()

1864 # trying to reindex on an axis with duplicates 1865
if not self.is_unique and len(indexer): -> 1866 raise ValueError("cannot reindex from a duplicate axis") 1867 1868 def reindex(self, target, method=None, level=None, limit=None):

ValueError: cannot reindex from a duplicate axis

Rob
  • 26,989
  • 16
  • 82
  • 98
icomefromchaos
  • 225
  • 4
  • 13
  • On what line exactly are you getting your error? How is your example different from: df = pd.DataFrame({'A':[1,1,2,4]},index=[1,1,2,2]); df[df.A.isin([1,2])] – Gecko Jun 11 '15 at 18:18
  • The exact line ValueError Traceback (most recent call last) in () 11 yellow.to_csv('sortedYellow.csv') 12 LightBlue.to_csv('SortedLightBlue.csv') ---> 13 IsInSort() 14 SaveSortedZips() – icomefromchaos Jun 11 '15 at 19:28
  • Not sure what this is: `from pandas import DataFrame as df` but not a good idea. df is, by convention, an instance of `pandas.DataFrame`. You should just delete that line. If you want to bring DataFrame into the namespace without having to precede it with `pd`, you can, but leave out the `as df`. – JohnE Jun 11 '15 at 20:05
  • Thanks JohnE. I made the change. – icomefromchaos Jun 11 '15 at 20:35
  • Not sure what is happening, but it's hard to figure out without the data. If you can reproduce the error with a small sample data set, that would help. – JohnE Jun 11 '15 at 23:35

1 Answers1

12

I'm pretty sure your problem is related to your mask

df = df[['VIN', 'Reg Name', 'Reg Address', 'Reg City', 'Reg ST', 'ZIP',
         'ZIP', 'Catagory', 'Phone', 'First Name', 'Last Name', 'Reg NFS',
         'MGVW', 'Make', 'Veh Model','E Mfr', 'Engine Model', 'CY2010',
         'CY2011', 'CY2012', 'CY2013', 'CY2014', 'CY2015', 'Std Cnt', 
        ]]

'ZIP' is in there twice. Removing one of them should solve the problem.

The error ValueError: cannot reindex from a duplicate axis is one of these very very cryptic pandas errors which simply does not tell you what the error is.

The error is often related to two columns being named the same either before or after (internally in) the operation.

firelynx
  • 30,616
  • 9
  • 91
  • 101