0

I checked similar topics, but the results are poor.

I have a file like this:

S1_22   45317082    31  0   9   22  1543
S1_23   3859606 40  3   3   34  2111
S1_24   48088383    49  6   1   42  2400
S1_25   43387855    39  1   7   31  2425
S1_26   39016907    39  2   7   30  1977
S1_27   57612149    23  0   0   23  1843
S1_28   42505824    23  1   1   21  1092
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018
S1_29   54856684    18  0   2   16  1018

I wanted to count occurencies of words in first column, and based on that write the output file with additional field stating uniq if count == 1 and multi if count > 0

I produced the code:

import csv
import collections

infile = 'Results'

names = collections.Counter()

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        names[row[0]] += 1
    print names[row[0]],row[0]

but it doesn't work properly

I can't put everything into list, since the file is too big

Selcuk
  • 57,004
  • 12
  • 102
  • 110
Irek
  • 439
  • 1
  • 8
  • 17

2 Answers2

1

The print statement at the end does not look like what you want. Because of its indentation it is only executed once. It will print S1_29, since that is the value of row[0] in the last iteration of the loop.

You're on the right track. Instead of that print statement, just iterate through the keys & values of the counter and check if each value is greater than or equal to 1.

RexE
  • 17,085
  • 16
  • 58
  • 81
1

If you want this code to work you should indent your print statement:

    names[row[0]] += 1
    print names[row[0]],row[0]

But what you actually want is:

import csv
import collections

infile = 'Result'

names = collections.Counter()

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        names[row[0]] += 1

for name, count in names.iteritems():
    print name, count

Edit: To show the rest of the row, you can use a second dict, as in:

names = collections.Counter()
rows = {}

with open(infile) as input_file:
    for row in csv.reader(input_file, delimiter='\t'):
        rows[row[0]] = row
        names[row[0]] += 1

for name, count in names.iteritems():
    print rows[name], count
Selcuk
  • 57,004
  • 12
  • 102
  • 110