-1

I have a set of sentences in a text file and I have the verbs from it marked as column headers in csv file. I need to mark a '1' in the specific cell under the right column, if the verb is present in that sentence. e.g.

If my sentence is: I like this movie.

My csv file has the headers: like, hate and loathe.

Then I need my csv file to look like

  like       hate       loathe
   1

Thanks in advance.

Here's the code I have tried:

with open('verb.csv', 'wb') as csvn:
    cwriter = csv.writer(csvn)
    cwriter.writerow([d for d in verbs])

where verbs is my list of verbs. This prints the verbs as column headers in csv file.

for l, label in file:
    t = nltk.word_tokenize(l)
    tt = nltk.pos_tag(t)
    for pos in tt:
        for p in pos[1]:
            c = 0
            if(p == 'V'):
                w = pos[0]
                for l in verbs:
                    if w == l:
                        print(c)
                        continue
                    else:
                        c+=1

Now w contains the verb and I can search for a matching word in the list of verbs and obtain its location, but I don't have a clue how I could mark the corresponding location in the csv file as 1. My python version is 2.7.

GobSmack
  • 2,171
  • 4
  • 22
  • 28
  • You should try something and then ask, also your python version is needed for this question. For a goos starting point please refer to python [docs](http://docs.python.org/2/library/csv.html) and to this [post](http://stackoverflow.com/a/14693848/1982962) which has a good example – Kobi K Oct 20 '13 at 08:09
  • I'm sorry, I'll edit my post. – GobSmack Oct 20 '13 at 08:23

2 Answers2

0

I would recommend incremental steps as you work on the code. Get certain parts working, then build in the rest. For example, from what we can see here, there should be a Value error on your first line when you try to read in file, unless that has already been parsed by csv or something.

You should also generate all of the output that you are going to put into the results file before you actually write anything to it. Build up a dictionary, I believe would work, of results and then at the end write it all out in the format you want. You can't really go back and write characters arbitrarily into certain columns of a file. You could append, but probably better to just build up the output and do it at the end.

It is not clear if you want one line for each sentence, or a total or what? Blank lines if no word is there, or a zero in each column?

Is your goal to become familiar with nltk, or to just get the desired output?

It seems that it would be more efficient to just test if a word is in a list. (if w in verbs will be much more efficient that nested loops.)

You are also changing the value of l within the loop. Use a different name.

When you write out the header, you don't need to break out the list and assemble it again with list comprehension. cwriter.writerow(verbs) should be fine if verbs is already a list.

There are too many problems here to fix in one answer, so I would re-iterate: baby steps. Get things working one feature at a time before you try to write out the whole chunk of code... Use a lot of print statements to see what values are being loaded.

Good luck!

beroe
  • 11,784
  • 5
  • 34
  • 79
  • I need a new row for each sentence and a 0 or blank space would do. I need to get this output in this format so that I can use it for further processing. I'm relatively new to python actually. But I'll definitely follow your advice and take baby steps and thank you. – GobSmack Oct 20 '13 at 09:21
0

I'd create an empty list after where you create your counter. the list has the same length as your n verbs.

c = 0
emptylist = [""] * len(verbs)

Then when you run through your verb list, use the counter (c) (btw, are you sure it's not print(l) you want to write in your code?) as the position in the empty list. I'd change the following part:

if w == l:
    print(c)
    emptylist[c] = 1
    ## then write emptylist to the csv with csv.writerow(emtptylist)
    continue
TTNor
  • 48
  • 1
  • 3