0

I want extract the names of products being sold from English text.

For example:

"I'm selling my xbox brand new"

"Selling rarely used 27 inch TV"

Should give me "xbox" and "27 inch TV"

The only thing I can think of at the moment is to hardcode in a giant list of important nouns and important adjectives: ['tv', 'fridge', 'xbox', 'laptop', etc]

Is there a better approach?

Razor Storm
  • 12,167
  • 20
  • 88
  • 148
  • 3
    [NLP](http://en.wikipedia.org/wiki/Natural_language_processing) isn't easy. – NullUserException Jan 24 '13 at 20:23
  • 1
    Seriously lanzz? Is the point of this site not to ask questions even when you have no clue where to start? Are algorithmic questions against the rules? – Razor Storm Jan 24 '13 at 21:00
  • Unfortunately, "no clue where to start" is often also, pretty much by definition, too broad, andea recommendation question. [Quick googling](https://google.com/search?q=text+extract+product-names) should reveal that this is still a research problem. – tripleee Jul 02 '18 at 08:49
  • Possible duplicate of [Text mining - extract name of band from unstructured text](https://stackoverflow.com/questions/6670498/text-mining-extract-name-of-band-from-unstructured-text) – tripleee Jul 02 '18 at 08:51

1 Answers1

1

It looks like nltk will give you a list of words and their parts of speech. Since you are only interested in nouns? this will provide you with them

>>> from nltk import pos_tag, word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad.")) 
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
('.', '.')]
dm03514
  • 54,664
  • 18
  • 108
  • 145