0

I am iterating via pandas data frame and adding them to CSV. Is it possible to write a function that will avoid writing the same rows to the CSV?(if such row already exists, look for another row)

output_path="C:\\Users\\y.Israfilbayov\\Desktop\\AGS\\agstocsv.csv"
geolDf.to_csv(output_path, mode='a',index= False, header=not os.path.exists(output_path))

Thanks in Advance!

YIF99
  • 51
  • 1
  • 8
  • Does this answer your question? [Pandas/Python: How to concatenate two dataframes without duplicates?](https://stackoverflow.com/questions/21317384/pandas-python-how-to-concatenate-two-dataframes-without-duplicates) –  Feb 02 '22 at 11:39
  • you can try using " drop_duplicates "...does this satisfy ?? – NITISH PANDEY Feb 02 '22 at 12:34

1 Answers1

0

When I tried to remove it from pandas itself, I got stuck, because I am iterating via "ags" file(specific geotechnical document format) and getting new data frames. So, the best option was to remove dublicates from the CSV itself. The code below is reasonable for my case.

#Remove dublicate
    with open("C:\\Users\\y.Israfilbayov\\Desktop\\AGS\\agstocsv.csv",'r') as in_file, open("C:\\Users\\y.Israfilbayov\\Desktop\\AGS\\agstocsv_noduplicate.csv",'w') as out_file:
      
        seen = set() # set for fast O(1) amortized lookup
        
        for line in in_file:
            if line in seen: 
              continue # skip duplicate
    
            seen.add(line)
            out_file.write(line)
YIF99
  • 51
  • 1
  • 8