0

I am a new user to Python and am trying to loop the following:

text = open('filename.txt', 'rU').read()  
splitter = Splitter()
postagger = POSTagger()
splitted_sentences = splitter.split(text)
pos_tagged_sentences = postagger.pos_tag(splitted_sentences)
dicttagger = DictionaryTagger([ 'dicts/positive.yml', 'dicts/negative.yml'])
dict_tagged_sentences = dicttagger.tag(pos_tagged_sentences)
scoreposneg = sentiment_score(dict_tagged_sentences)
dicttagger = DictionaryTagger([ 'dicts/positive.yml', 'dicts/negative.yml', 'dicts/inc.yml', 'dicts/dec.yml', 'dicts/inv.yml'])
dict_tagged_sentences = dicttagger.tag(pos_tagged_sentences)
scoretotal = sentiment_total(dict_tagged_sentences)
print scoretotal

Prior to this, I have set up class for Splitter, POSTagger, DictionaryTagger, sentiment_total and also created 5 different dictionaries. This works when I run it on Python for 1 filen(I got a number from print scoretotal). However, when I tried to create a loop and print all the outputs (I have 650 files in my directory), it didn't work (nothing was printed) and the scoretext.txt file had 0.0 in them.

path = '/mydirectory'
files = glob.glob(path)
for file in files:
    text = open(file, 'rU').read()
    splitter = Splitter()
    postagger = POSTagger()
    splitted_sentences = splitter.split(text)
    pos_tagged_sentences = postagger.pos_tag(splitted_sentences)
    dicttagger = DictionaryTagger([ 'dicts/positive.yml', 'dicts/negative.yml'])
    dict_tagged_sentences = dicttagger.tag(pos_tagged_sentences)
    scoreposneg = sentiment_score(dict_tagged_sentences)
    dicttagger = DictionaryTagger([ 'dicts/positive.yml', 'dicts/negative.yml', 'dicts/inc.yml', 'dicts/dec.yml', 'dicts/inv.yml'])
    dict_tagged_sentences = dicttagger.tag(pos_tagged_sentences)
    scoretotal = sentiment_total(dict_tagged_sentences)
    print scoretotal

scoretotal = np.zeros((1,650))
scoretotal_no = 0
scoretotal_no = scoretotal_no + 1

np.savetxt("scoretext.txt", scoretotal, delimiter=" ", fmt="%s")

Would really appreciate if someone can provide some insights on this. Thank you!

  • I'm not sure I understand what this is supposed to do (`scoretotal` is reset outside the loop?) but what do you get from `print(files)` before `for file in files:`? Also, `file` is a builtin in Python 2.x so I would choose another name. – roganjosh Feb 01 '17 at 11:33
  • Hi Roganjosh, what i got was [] . I've changed file to filename. I thought I'm saving the 650 scores into an array so it has to be out of the loop – Tiffany Tang Feb 01 '17 at 11:40
  • Ok, so firstly the loop itself doesn't work because `path = '/mydirectory'` is giving you an empty list in `files = glob.glob(path)` (there is nothing to iterate through). That isolates one issue. But `scoretotal = np.zeros((1,650))` completely destroys any data you might have accumulated during the loops for `scoretotal`. In fact, each loop will just destroy data from the previous iteration (I assume, but I don't know what `sentiment_total()` does). – roganjosh Feb 01 '17 at 11:43
  • I've def the sentiment_total (def sentiment_total(review): return sum([sentence_score(sentence, None, 0.0) for sentence in review])) to return a score of the text. Would you recommend to loop using i and range instead? – Tiffany Tang Feb 01 '17 at 12:17
  • It's not really possible to know what you should be doing, but your loop cannot work because `files = glob.glob(path)` is returning an empty list. You first problem is _before_ your `for` loop anyway, so that needs fixing before you get to the loop. – roganjosh Feb 01 '17 at 12:58
  • @roganjosh thanks, I understand better now – Tiffany Tang Feb 01 '17 at 14:16

1 Answers1

0

Your problem may be how you access the files in the directory. See this: Use a Glob() to find files recursively in Python?

Community
  • 1
  • 1
Nick H
  • 1,081
  • 8
  • 13
  • Hi Nick, I've also tried (but it also didn't work): directory = os.path.join("/textanalysisfiles","path") for root,dirs,files in os.walk(directory): for file in files: – Tiffany Tang Feb 01 '17 at 11:37
  • Agree with @roganjosh ,file is a reserved name in python, so use something else, even: for files in file_list – Nick H Feb 01 '17 at 11:52