-2

I'm newbie python programmer here I suppose.

I want to extract only the related keywords if we were given any list/array in python.

Ex: Extract only food related words in a list of strings A user for example can input a list.

[0] I want to buy some apple.
[1] Oranges are good for the health.
[2] I bought 2 blueberries yesterday.
[3] John is eating some grapes.
[4] My crush did not like me back.

Expected output would be:

[0] apple
[1] oranges
[2] blueberries
[3] grapes
[4] None

I would appreciate if anyone could point out to me how I could achieve something like this. I'm still new to programming and found out how to extract words recently but ONLY if I have a list of categorized food for reference. I could just compare and extract Yaayyy!! :D But assuming that the user can input any items he/she wants, how would I go about approaching a solution for this? Would it be practical to copy all the foods in the whole world and store them in a list for reference? I've looking for a solution. Maybe I might have missed it for some reason. If anyone could point me to a link or topic that would be great if this is a duplicate!

Btw a shoutout and thank you to all the people in StackOverFlow really helped me a lot! :)

deedzM
  • 1
  • 1
  • 2
  • You really need to import a word list of "foods" by yourself. This is unavoidable. – iBug Jan 30 '18 at 12:01
  • The program can't just *know* what names are used for food. As @iBug put it, somewhere needs to be a reference list. I don't know what you mean by the list being categorized, but a just a list, or perhaps a dict will do the trick. – Felix Jan 30 '18 at 12:49
  • But to get closer to an actual solution, if you don't want to import a list of EVERY food ever, a long term solution might be for you to add an option for the user to add his/her own foods if it's needed and just import a basic list of common foods. And maybe just use the bodies of words, e.g. 'blueberr' for blueberry, blueberries etc. – Felix Jan 30 '18 at 12:53
  • Hi there! Thank you for all your ideas @Felix and iBug. I came back searching for more ideas about this. I may have found a clue in solving this problem but I guess its not yet my level in programming. I see this Natural Language Processing and Word2Vec stuffs from Youtube and Google Results... but yeah I guess they are still too complex for me to learn HAHAHAH – deedzM Jan 30 '18 at 16:36

2 Answers2

0

This is one way, which will deal with the general algorithm but not the problem of singular / plural of specific foods.

import string

lst = ['I want to buy some apple.',
       'Oranges are good for the health.',
       'I bought 2 blueberries yesterday.',
       'John is eating some grapes.',
       'My crush did not like me back.']

foods = {'apple', 'oranges', 'blueberries', 'grapes'}

translation = str.maketrans('', '', string.punctuation)
lst2 = [set(i.translate(translation).lower().split(' ')) & foods for i in lst]

# [{'apple'}, {'oranges'}, {'blueberries'}, {'grapes'}, set()]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Hi there @jp_data_analysis! Yeap this worked in the first version of what I was trying to do but I was trying something wherein the inputs of the user would be dynamic. But thanks again for your answer :) – deedzM Jan 30 '18 at 16:47
  • @deedzM, no problem - feel free to accept or up-vote if it helped. – jpp Jan 30 '18 at 16:50
  • Yes I tried both your's and @Nestor Yanchuck but it says "Vote casts with less than 15 reputation are recorded, but do not display it publicly" – deedzM Jan 30 '18 at 16:54
  • that's fine. glad my answer helped though. – jpp Jan 30 '18 at 16:56
0

Basically, there is no such magic tool you could use. You need to generate the list of words by yourself. I advise you to check out nltk library. It'll help you extract words and sentences correctly. Then you should check each word separately (like if my_word in my_food_list: blablabla....).

You can also check out this similar question.

Nestor Yanchuk
  • 1,186
  • 8
  • 10
  • Ha! thank for this @Nestor Yanchuck... yeap I came across this as I continued searching! Sadly I think I'm not yet too confident with getting by this or I might have found it a little bit complex in my level of programming but thanks again! Btw tried to vote this up but my reputation is still low – deedzM Jan 30 '18 at 16:45