1

I am a newbie in Python and would like to do POS tagging after importing csv file from my local machine. I looked up some resources from online and found that the following code works.

text = 'Senator Elizabeth Warren from Massachusetts announced her support of 
Social Security in Washington, D.C. on Tuesday. Warren joined other 
Democrats in support.'  
import nltk
from nltk import tokenize
sentences = tokenize.sent_tokenize(text)
sentences

from nltk.tokenize import TreebankWordTokenizer
texttokens = []
for sent in sentences:
 texttokens.append(TreebankWordTokenizer().tokenize(sent))
texttokens

from nltk.tag import pos_tag
taggedsentences = []
for sentencetokens in texttokens:
 taggedsentences.append(pos_tag(sentencetokens))
taggedsentences

print(taggedsentences)

Since I printed it, the result from the code above looks like this.

[[('Senator', 'NNP'), ('Elizabeth', 'NNP'), ('Warren', 'NNP'), ('from', 
'IN'), ('Massachusetts', 'NNP'), ('announced', 'VBD'), ('her', 'PRP$'), 
('support', 'NN'), ('of', 'IN'), ('Social', 'NNP'), ('Security', 'NNP'), 
('in', 'IN'), ('Washington', 'NNP'), (',', ','), ('D.C.', 'NNP'), ('on', 
'IN'), ('Tuesday', 'NNP'), ('.', '.')], [('Warren', 'NNP'), ('joined', 
'VBD'), ('other', 'JJ'), ('Democrats', 'NNPS'), ('in', 'IN'), ('support', 
'NN'), ('.', '.')]]

This is a desirable result that I would like to get, but I would like to get the result after importing csv file which contains several rows (in each row, there are several sentences.). For example, the csv file looks like this:

---------------------------------------------------------------
I like this product. This product is beautiful. I love it. 
---------------------------------------------------------------
This product is awesome. It have many convenient features.
---------------------------------------------------------------
I went this restaurant three days ago. The food is too bad.
---------------------------------------------------------------

In the end, I would like to save the desirable pos tagging results that I displayed above after importing the csv file. I would like to save (write) the (pos tagged) each sentence in each row as a csv format.

Two formats might be possible. First one might be as follows (no header, each (pos tagged) sentence in one row).

----------------------------------------------------------------------------
[[('I', 'PRON'), ('like', 'VBD'), ('this', 'PRON'), ('product', 'NN')]]
----------------------------------------------------------------------------
[[('This', 'PRON'), ('product', 'NN'), ('is', 'VERB'), ('beautiful', 'ADJ')]]
---------------------------------------------------------------------------
[[('I', 'PRON'), ('love', 'VERB'), ('it', 'PRON')]]
----------------------------------------------------------------------------
...

The second format might look like this (no header, each set of token and pos tagger saved in one cell):

----------------------------------------------------------------------------
('I', 'PRON')    | ('like', 'VBD')   | ('this', 'PRON') | ('product', 'NN')
----------------------------------------------------------------------------
('This', 'PRON') | ('product', 'NN') | ('is', 'VERB')   | ('beautiful', 'ADJ')
---------------------------------------------------------------------------
('I', 'PRON')    | ('love', 'VERB')  | ('it', 'PRON')   |
----------------------------------------------------------------------------
...

I prefer the second format to the first one.

The python code that I wrote here perfectly works but I would like to do the same thing for csv file and in the end save it in my local machine.

Final purpose of doing this is that I would like to extract only noun types of words (e.g., NN, NNP) from the sentences.

Can somebody help me how to fix the python code?

Emily
  • 305
  • 3
  • 13
  • 1
    I'm curious how do you plan on using the resulting CSV? I only ask because trying to reload this into python with the parenthesis might become a headache. – Tony Sep 01 '17 at 19:58

1 Answers1

-1

Please refer to the question already answered here. You can just do some tagging to filter out just the Nouns as described in the post.SO Link

Sincole Brans
  • 186
  • 12
  • 1
    If this question is a duplicate of your link, then vote to close it as a duplicate. Link-only answers are discouraged. – Mark Tolonen Sep 02 '17 at 05:37
  • @Sincole Brans, I need nouns now but will need adj in the future too. So I would like to extract all word and tagging sets, not just nouns. The question was more like how to import a csv file and save the results in a csv form instead of 'print', given the fact that the code above is working. I am a really newbie to python so sorry for the basic question and confusion in the question. In addition, I don't need parenthesis in the results. I just need the sets of 'word and tagging' in a distinguished format meaning that I and a program (R or python) can distinguish which tag indicates which word – Emily Sep 03 '17 at 12:25