Tokenizing and POS tagging in Python from CSV file

Question

I am a newbie in Python and would like to do POS tagging after importing csv file from my local machine. I looked up some resources from online and found that the following code works.

text = 'Senator Elizabeth Warren from Massachusetts announced her support of 
Social Security in Washington, D.C. on Tuesday. Warren joined other 
Democrats in support.'  
import nltk
from nltk import tokenize
sentences = tokenize.sent_tokenize(text)
sentences

from nltk.tokenize import TreebankWordTokenizer
texttokens = []
for sent in sentences:
 texttokens.append(TreebankWordTokenizer().tokenize(sent))
texttokens

from nltk.tag import pos_tag
taggedsentences = []
for sentencetokens in texttokens:
 taggedsentences.append(pos_tag(sentencetokens))
taggedsentences

print(taggedsentences)

Since I printed it, the result from the code above looks like this.

[[('Senator', 'NNP'), ('Elizabeth', 'NNP'), ('Warren', 'NNP'), ('from', 
'IN'), ('Massachusetts', 'NNP'), ('announced', 'VBD'), ('her', 'PRP$'), 
('support', 'NN'), ('of', 'IN'), ('Social', 'NNP'), ('Security', 'NNP'), 
('in', 'IN'), ('Washington', 'NNP'), (',', ','), ('D.C.', 'NNP'), ('on', 
'IN'), ('Tuesday', 'NNP'), ('.', '.')], [('Warren', 'NNP'), ('joined', 
'VBD'), ('other', 'JJ'), ('Democrats', 'NNPS'), ('in', 'IN'), ('support', 
'NN'), ('.', '.')]]

This is a desirable result that I would like to get, but I would like to get the result after importing csv file which contains several rows (in each row, there are several sentences.). For example, the csv file looks like this:

---------------------------------------------------------------
I like this product. This product is beautiful. I love it. 
---------------------------------------------------------------
This product is awesome. It have many convenient features.
---------------------------------------------------------------
I went this restaurant three days ago. The food is too bad.
---------------------------------------------------------------

In the end, I would like to save the desirable pos tagging results that I displayed above after importing the csv file. I would like to save (write) the (pos tagged) each sentence in each row as a csv format.

Two formats might be possible. First one might be as follows (no header, each (pos tagged) sentence in one row).

----------------------------------------------------------------------------
[[('I', 'PRON'), ('like', 'VBD'), ('this', 'PRON'), ('product', 'NN')]]
----------------------------------------------------------------------------
[[('This', 'PRON'), ('product', 'NN'), ('is', 'VERB'), ('beautiful', 'ADJ')]]
---------------------------------------------------------------------------
[[('I', 'PRON'), ('love', 'VERB'), ('it', 'PRON')]]
----------------------------------------------------------------------------
...

The second format might look like this (no header, each set of token and pos tagger saved in one cell):

----------------------------------------------------------------------------
('I', 'PRON')    | ('like', 'VBD')   | ('this', 'PRON') | ('product', 'NN')
----------------------------------------------------------------------------
('This', 'PRON') | ('product', 'NN') | ('is', 'VERB')   | ('beautiful', 'ADJ')
---------------------------------------------------------------------------
('I', 'PRON')    | ('love', 'VERB')  | ('it', 'PRON')   |
----------------------------------------------------------------------------
...

I prefer the second format to the first one.

The python code that I wrote here perfectly works but I would like to do the same thing for csv file and in the end save it in my local machine.

Final purpose of doing this is that I would like to extract only noun types of words (e.g., NN, NNP) from the sentences.

Can somebody help me how to fix the python code?

I'm curious how do you plan on using the resulting CSV? I only ask because trying to reload this into python with the parenthesis might become a headache. — Tony, Sep 01 '17 at 19:58

score -1 · Answer 1 · answered Sep 01 '17 at 21:07

-1

Please refer to the question already answered here. You can just do some tagging to filter out just the Nouns as described in the post.SO Link

answered Sep 01 '17 at 21:07

Sincole Brans

186
12

1

If this question is a duplicate of your link, then vote to close it as a duplicate. Link-only answers are discouraged. – Mark Tolonen Sep 02 '17 at 05:37
@Sincole Brans, I need nouns now but will need adj in the future too. So I would like to extract all word and tagging sets, not just nouns. The question was more like how to import a csv file and save the results in a csv form instead of 'print', given the fact that the code above is working. I am a really newbie to python so sorry for the basic question and confusion in the question. In addition, I don't need parenthesis in the results. I just need the sets of 'word and tagging' in a distinguished format meaning that I and a program (R or python) can distinguish which tag indicates which word – Emily Sep 03 '17 at 12:25

Tokenizing and POS tagging in Python from CSV file

1 Answers1