-1

I have this data set which is in this format in this way in csv file:

enter image description here

1st question : I am trying to find duplicates rows in the table just created in python below? I did try to use the set function to run the rows and the output I got is no duplicates even though there is a duplicate row in the data set.

2nd question: is it possible to reference this table as i realized that it becomes a table when I print?So that I can use it on the next step for calculation purpose.

COL_1_WIDTH = 10
COL_2_WIDTH = 35
for row in data:
IC1 = len(str(row[0]))
IC2 = len(str(row[1]))
 print( str(row[0])+ str( (COL_1_WIDTH-IC1) *' ') +\
 str(row[1]) + str( (COL_2_WIDTH-IC2) *' ') +\
 str(row[2]))

for row in data:
 if len(set(row)) !=len(row):
 print ('duplicates: ', row)
else:
    print ('no duplicates:', row)

P.s. Permit to use built in function & numpy only.

Grateful for any ideas. Thank you!

user149635
  • 65
  • 2
  • 12
  • `len(set(data)) != len(data)` will tell you if dupes, leaving you still some work to find out what are the dupes. (You're only checking one item at a time so len is always going to be one for set and non-set.) – Scott Carpenter Jun 23 '18 at 14:10
  • Hi, thanks!, I just ran and changed it to data. Got error :TypeError: unhashable type: 'writeable void-scalar' – user149635 Jun 23 '18 at 14:20
  • What kind of table is this (e.g. are you using Pandas)? Please provide a [mcve]. – jpp Jun 23 '18 at 14:41
  • Hi, I am not using pandas. Use the built in function. – user149635 Jun 23 '18 at 14:49

1 Answers1

1

You don't really explain what kind of object is 'data', so I assumed it was a list of strings. Here's how I created mine from a csv file:

with open('/home/sebastien/Documents/answerSO.csv') as file:
    data=file.read()    #a string

data=data.split('\n')   #a list of strings
data.pop()      #to delete the last element, an empty string

(note that using the csv module may be a better idea)

Now, to look for duplicates, I used the method explained here: How do I find the duplicates in a list and create another list with them?

seen = set()
uniq = []
for row in data:
    if row not in seen:
        uniq.append(row)
        seen.add(row)
    else:
        print("found a duplicate:",row)

And about referencing it, well, it's in 'data'

Seb
  • 71
  • 3
  • Apologies, 'data' refers to the picture table with headers: quarter, type, value – user149635 Jun 23 '18 at 15:56
  • Okay, but I meant in Python. If you juste run this script, it will raise a `NameError: name 'data' is not defined` – Seb Jun 23 '18 at 16:03
  • Thank you!, i tried running the script. Encountered error in this row :if row not in seen: TypeError: unhashable type: 'writeable void-scalar' which i did not understand. What i did, is place the block of scripts after I load the data. Before that, I exclude the new created table in the question. – user149635 Jun 23 '18 at 16:12
  • I think I get it. Could you print 'data' and send the result? My guess is there was a problem splitting it and you ended up with an unhasable object in row, so not the string I have. Surely, a difference with the convention for 'end of line' in csv. Mac uses '\r', and Windows '\r\n'. – Seb Jun 23 '18 at 18:07
  • I know why. I used 2nd block of your scripts and continued with the question script above.I use data= np.loadtxt to open the file. I am puzzled unable to use it. The question script output : 2007-Q1 1-Small1 – user149635 Jun 23 '18 at 19:58
  • I know why. I used 2nd block of your scripts and continued with the question script above. I use data= np.loadtxt to open the file. I am puzzled unable to use it. The ques. orginal script output : 2007-Q1 1-Small1. I modified ques.script output into: 2007-Q11-Small1 & follow up use your 2nd block script again. Trial with set() -> no result & set(data) but obtain error unhashable etc. Grateful if some idea. – user149635 Jun 23 '18 at 20:20
  • Using numpy to process things other than numbers may not be a good idea. But if you juste run my whole script, it should do what you're asking for – Seb Jun 23 '18 at 22:16