0

I have a CSV file in which contains one column (column1). I want to check whether the element in cell repeats and how many times(occcurance_count).And print count of occurrence in the same CSV file using Python.
In the below example the "241682-27638-USD-OCOF" is not repeating so the count is one, "241942-37190-USD-DIV" is repeated twice so the count is 2 and so on.

Want the output as below in CSV format

column1                  ,occcurance_count

1682-27638-USD-OGGCOF ,1

241682-27638-USD-OGGINT ,1

241682-27638-USD-CIGGNT ,1

241682-27638-USD-OCGGINT ,1

241942-37190-USD-GGDIV ,2

241942-37190-USD-CHYOF ,1

241942-37190-USD-EQPL ,1

241942-37190-USD-INT ,1

242066-15343-USD-CYJOF ,3

242066-15343-USD-CYJOF ,3

242066-15343-USD-CYJOF ,3

242066-15343-USD-ETHQPL ,1

242066-15343-USD-INFRT ,1

241942-37190-USD-GGDIV ,2

242066-33492-USD-CJHOF ,1
fredtantini
  • 15,966
  • 8
  • 49
  • 55
Rohit
  • 848
  • 3
  • 15
  • 31

4 Answers4

2

As the count repeats you just need a normal dict:

d = {}
with open(infile) as f:
    next(f)
    for line in f:
        spl = line.rstrip().split(",")
        spl[0]= spl[1]

for k,v in d.items():
    print("key = {} count = {}".format(k,v))

If your file posted is actually expected output and you are trying to count each occurrence of a file with a single string on each line and the write the line and count:

from collections import Counter

d = Counter()
with open("i.csv") as f, open("out.csv","w") as out:
    for line in f:
        d.update([line.rstrip()]) # get counts 
    f.seek(0) # g back to start of the file
    out.write("column1, occcurance_count")
    for line in f:
       out.write("{}, {}\n".format(line.rstrip(),d[line.rstrip()])) # write line plus count of that line
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

I think below is the code which you are looking for. logic is simple but lengthier too. Explanation about logic: first you need to open csv file for reading and list down all elements in list Then use list count method to find out number of occurrence of each list item open the new csv file and write item and count for each item.

Surely there could be optimize way of doing the same thing but here is code which comes quickly.

    import csv
    import sys

    try :
        fr = open("mycsv.csv")
        fw = open("mscsv_counter.csv", "w")
    except:
        print "Couldn't open the file"

    reader = csv.reader(fr)

    counterlist = list()
    for row in reader :
     #   print row
         if len(row) > 0 :
            counterlist.append(row[0])
    #for item in counterlist :
    #    print counterlist.count(item)

    writer = csv.writer(fw)
    data = ["column 1", "counter"]
    writer.writerow(data)
    for item in counterlist :
        rowdata = [item, counterlist.count(item)]
     #   print rowdata
        writer.writerow(rowdata)

    fr.close();
    fw.close();
Sujal Sheth
  • 382
  • 4
  • 10
0

You could use Counter:

>>> counter = Counter(line[0] for line in values.readlines())

>>> counter['242066-15343-USD-CYJOF']
3

>>> counter['241682-27638-USD-OGGINT]
2
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
0

Here is a simple code. Hope this will help you:

>>> import numpy as np
>>> data=np.loadtxt('a.csv', dtype=str)
>>> data
array(['241682-27638-USD-OCOF', '241682-27638-USD-OINT',
       '241682-27638-USD-CINT', '241682-27638-USD-OCINT',
       '241942-37190-USD-DIV', '241942-37190-USD-COF',
       '241942-37190-USD-EQPL', '241942-37190-USD-INT',
       '242066-15343-USD-COF', '242066-15343-USD-COF',
       '242066-15343-USD-COF', '242066-15343-USD-EQPL',
       '242066-15343-USD-INT', '241942-37190-USD-DIV',
       '242066-33492-USD-COF'], 
      dtype='|S22')
>>> count = [len(np.where(data==i)[0]) for i in data]
>>> count
[1, 1, 1, 1, 2, 1, 1, 1, 3, 3, 3, 1, 1, 2, 1]
>>> fp=open('a.csv','w')
    for i in range(data.shape[0]):
        fp.write(str(data[i]) + ' , ' + str(count[i]) + '\n')

    fp.close()
Irshad Bhat
  • 8,479
  • 1
  • 26
  • 36