0

I am importing CSV data into postgreSQL using Python. I seem to have duplicates in my CSV file. I have five columns in my CSV file and one of them is username. How can I tell Python to show me the duplicates with the same username in my file. Please provide your kind assistance. I am new to Programming so please pardon my stupidity. If it's not possible with this code and script, how can I manipulate the code so I can find them duplicates in the CSV file.

import psycopg2
import csv

csv_data = csv.reader(file('SampleData2.csv'))
Jacob_Cortese
  • 139
  • 1
  • 4
  • 15
  • "I seem to have duplicates in my CSV file. " Why? Does PostgreSQL give you an error? Which error? –  Nov 01 '16 at 17:22
  • 1
    When I run a query in postgreSQL, count(username) > 1 it returns lots of usernames. A username is a unique field it shouldn't have duplicates. I want to see those duplicates in Python and then once I see them, I want to be able to delete them. I want to do this process in Python. – Jacob_Cortese Nov 01 '16 at 17:26
  • 1
    So you can [edit] your question and delete all the PostgreSQL related code your question is not about. Instead include the code you've tried to remove duplicates from the CSV input. –  Nov 01 '16 at 17:28
  • Thanks Lutz, I have edited the original post to reflect my concern. – Jacob_Cortese Nov 01 '16 at 17:31
  • 1
    Now please add a sample of the input that does contain duplicates. Please also explain *why* they are duplicates. Don't forget to include the Python code you have tried to detect and remove them. –  Nov 01 '16 at 17:32
  • 1
    @Pythonlearner, You have to use Sets in Python.https://docs.python.org/2/library/sets.html –  Nov 01 '16 at 18:03
  • Thanks Developer, yes I have to use Sets but I don't know any code regarding sets in Python. – Jacob_Cortese Nov 01 '16 at 18:04

1 Answers1

1

If what you want is to be able to import the file into the database without creating duplicates you can do an 'UPSERT' of sorts. This will update (which won't matter since its a duplicate) or create a new record.

See this SO answer: Insert, on duplicate update in PostgreSQL?

Community
  • 1
  • 1
Joshua Hunter
  • 515
  • 4
  • 8