Python columns match CSV

Question

I am trying to read in a csv file: base_list.csv - CSV file with two columns

Then read in file_1.csv and remove and matching values from the base_list.csv file, and write those out to a new csv file called Dups.csv

When I run this I am getting the following error:

emails = set(emails) #"set" removes duplicates in a list TypeError: unhashable type: 'list'

Sample code below:

import csv
#gather emails from base_list:
with open("H:\\Python Backups\\DeDup\\ByCSV\\base_list.csv", "rU") as base_file:
    read_base_file = csv.reader(base_file, delimiter=",")
    duplicates_list = []
    rows = [row for row in read_base_file]
    for row in rows:
        duplicates_list.extend(row)
    #extract emails from other csv files (csv_files) from multiple
    #columns in those csv files (email_columns):
    emails = []
    with open("H:\\Python Backups\\DeDup\\ByCSV\\file_1.csv", "rU") as csvfile:
        read_csv = csv.reader(csvfile, delimiter=",")     
        email_rows = [r for r in read_csv]
        emails.extend(email_rows)
    #find duplicates from base_list and remove them:
    duplicates = [e for e in emails if e in duplicates_list]
    for dupe in duplicates:
        emails.remove(dupe)
    emails = set(emails) #"set" removes duplicates in a list
    #write the emails to a csv:
    writer = csv.writer(open("H:\\Python Backups\\DeDup\\ByCSV\\Dups.csv", "ab"))
    for email in zip(emails):
        writer.writerow(email)

your question is about removing duplicate lists. This can be achieved by converting them to tuples, using a set and convert to list again. It's a dupe — Jean-François Fabre, Oct 26 '18 at 19:00
actually i think his real problem is that he's trying to add a list to a set, and he's wondering why he's getting the "unhashable" TypeError. key words: i think — Julian, Oct 26 '18 at 19:06

Julian · Answer 1 · 2018-10-26T19:52:47.493

0

The error you are getting is because you are trying to store a list in a set. This isn't possible, because lists are mutable in python and thus unhashable.

>>> list_of_lists = [[1,2,3], ['a','b','c']]
>>> set(list_of_lists)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Change email_rows to be

email_rows = (tuple(r) for r in read_csv)

which will create a list of generators of tuples, which is now hashable.

edited Oct 26 '18 at 19:52

answered Oct 26 '18 at 19:05

Julian

1,078
5
17

changed as suggested above still get the same error unhashable type list - email_rows = (r for r in read_csv) – John T Oct 26 '18 at 19:15
ahh, that's probably because the row itself if a list, change it to `(tuple(r) for r in read_csv`. I'll update my answer – Julian Oct 26 '18 at 19:52
Thank you that fixed the list/tuple syntax error, now the issue is the logic does not work, getting just results from file_1 – John T Oct 26 '18 at 20:04

Python columns match CSV

1 Answers1