1

I just want to calculate word count from some 7500 files with some condition on which words to count. The program goes like this.

import glob
import mincemeat

text_files = glob.glob('../fldr/2/*')
def file_contents(file_name):
f = open(file_name)
try:
    return f.read()
finally:
    f.close()

source = dict((file_name, file_contents(file_name))
          for file_name in text_files)

def mapfn(key, value):
  for line in value.splitlines():
    list2 = [ ]
    for temp in line.split("::::"):
        list2.append(temp)
    if (list2[0] == '5'):
        for review in list2[1].split():
            yield [review.lower(),1]

def reducefn(key, value):
  return key, len(value)

s = mincemeat.Server()
s.datasource = source
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="wola")
print results

The error I get while running this program is

error: uncaptured python exception, closing channel <__main__.Client connected at 0x250f990> 
(<type 'exceptions.IndexError'>:list index out of range 
 [C:\Python27\lib\asyncore.py|read|83] 
 [C:\Python27\lib\asyncore.py|handle_read_event|444] 
 [C:\Python27\lib\asynchat.py|handle_read|140] 
 [mincemeat.py|found_terminator|96]
 [mincemeat.py|process_command|194] 
 [mincemeat.py|call_mapfn|170] 
 [projminc2.py|mapfn|21])
senshin
  • 10,022
  • 7
  • 46
  • 59
amian
  • 187
  • 2
  • 2
  • 10

1 Answers1

0

Take a look at what's in list2 e.g. by doing

print(list2)

or with a debugger. If you do this you'll see that list2 only has one element so list2[1] isn't valid.

(You don't really want to split on "::::" - that's a typo in your script).

Anna
  • 56
  • 3