1

I have a file having a few columns like:

PAIR 1MFK 1 URANIUM 82 HELIUM 112 2.5506  
PAIR 2JGH 2 PLUTONIUM 98 POTASSIUM 88 5.3003  
PAIR 345G 3 SODIUM 23 CARBON 14 1.664  
PAIR 4IG5 4 LITHIUM 82 ARGON 99 2.5506  
PAIR 234G 5 URANIUM 99 KRYPTON 89 1.664  

Now what I wanted to do is read the last column and iterate the values for repetitions and generate an output file containing two column 'VALUE' & 'NO OF TIMES REPEATED'.

I have tried like:

inp = ('filename'.'r').read().strip().replace('\t',' ').split('\n')
from collections import defaultdict
D = defaultdict(line)

for line in map(str.split,inp):
     k=line[-1]
     D[k].append(line)

I'm stuck here.
plaese help.!

diffracteD
  • 758
  • 3
  • 10
  • 32
  • `[v for k, v in D.items() while count != -1: count += 1]` is not valid Python, so you get an error. What were you hoping it meant? What is the `count` for? – Karl Knechtel May 06 '12 at 04:52
  • Look at this question http://stackoverflow.com/questions/5505891/using-while-in-list-comprehension-or-generator-expressions on using `while` in list comprehensions – Elvis D'Souza May 06 '12 at 04:54

2 Answers2

2

There are a number of issues with the code as posted. A while-loop isn't allowed inside a list comprehension. The argument to defaultdict should be list not line. Here is a fixed-up version of your code:

from collections import defaultdict
D = defaultdict(list)

for line in open('filename', 'r'):
    k = line.split()[-1]
    D[k].append(line)

print 'VALUE    NO TIMES REPEATED'
print '-----    -----------------'
for value, lines in D.items():
    print '%-6s           %d'  % (value, len(lines))

Another way to do it is to use collections.Counter to conveniently sum the number of repetitions. That let's you simplify the code a bit:

from collections import Counter
D = Counter()

for line in open('filename', 'r'):
    k = line.split()[-1]
    D[k] += 1

print 'VALUE    NO TIMES REPEATED'
print '-----    -----------------'
for value, count in D.items():
    print '%-6s           %d'  % (value, count)
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
0

Now what I wanted to do is read the last column and iterate the values for repetitions and generate an output file containing two column 'VALUE' & 'NO OF TIMES REPEATED'.

So use collections.Counter to count the number of times each value appears, not a defaultdict. (It's not at all clear what you're trying to do with the defaultdict, and your initialization won't work, anyway; defaultdict is constructed with a callable that will create a default value. In your case, the default value you apparently had in mind is an empty list, so you would use list to initialize the defaultdict.) You don't need to store the lines to count them. The Counter counts them for you automatically.

Also, processing the entire file ahead of time is a bit ugly, since you can iterate over the file directly and get lines, which does part of the processing for you. Although you can actually do that iteration automatically in the Counter creation.

Here is a complete solution:

from collections import Counter
with open('input', 'r') as data:
    histogram = Counter(line.split('\t')[-1].strip() for line in data)
with open('output', 'w') as result:
    for item in histogram.iteritems():
        result.write('%s\t%s\n' % item)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153