0

I have a csv that looks something like this:

file.csv

name,apptype
AppABC,python
appabc,python
AppABB,python
AppABA,python
Appaba,python

I need to figure out a way to determine if any "name" exists as a case insensitive duplicate and report back the results.

In this case I should know that the following are duplicates:

AppABC,python
appabc,python
AppABA,python
Appaba,python

This is what I was trying, but it's not working.

with open(appcsv_path) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')

for name in csv_reader:
    re.findall(name, csv_reader, flags=re.IGNORECASE)

This results in an error:

TypeError: unhashable type: 'list'

Using the Pandas method below but editing it for "Name" not "name":

    df = pd.read_csv(appcsv_path)
out = df[df.Name.str.strip().str.lower().duplicated(keep=False)].loc[0:0]
print(out.to_string(index=False)) 

Results in:

Empty DataFrame
Columns: [Name, Type]
Index: []
ddevalco
  • 1,209
  • 10
  • 20
  • 1
    What have you tried so far based on your own research, and what went wrong with your attempts? For example, generally comparing strings without case sensitivity, you [lowercase the strings](https://stackoverflow.com/questions/6797984/how-do-i-lowercase-a-string-in-python) – G. Anderson Aug 05 '22 at 15:32
  • Updated the question with what I tried in hopes of solving it but it was a long shot. I don't fully understand what I'm doing with python, I just know enough to copy and paste things from stackoverflow and sometimes get lucky – ddevalco Aug 05 '22 at 15:38

1 Answers1

1

Here is a pandas solution using duplicated

import pandas as pd
df = pd.read_csv(appcsv_path)
out = df[df.name.str.strip().str.lower().duplicated(keep=False)].loc[:,'name']

Output :

which will give you the expected output

print(out.to_string(index=False))
AppABC
appabc
AppABA
Appaba

or to keep both the columns you can do

out = df[df.name.str.strip().str.lower().duplicated(keep=False)]
print(out.to_string(index=False))

which gives you

  name apptype
AppABC  python
appabc  python
AppABA  python
Appaba  python
Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93