1

I should apologize first if this question was already answered. I cant find it here. My problem is as follows. I have a pre curated list of words and I also have some unstructured text like this.

2 1/2 cups all-purpose flour
1 cup rolled oats
1/2 teaspoon ground ginger
1 cup chopped walnuts
1 teaspoon vanilla extract
1 teaspoon pumpkin pie spice
1 teaspoon celery oil

My list contains following matching keywords [flour,oat,ginger,walnut,celery,celery_oil,....].I want to convert the unstructured text in to my matching keyword list.Like in the following.

flour
oat
ginger
walnut
vanilla 
pumpkin
celery_oil

Can anyone suggest me a method to convert those items using python. Currently I have some experience about pandas. Thank You very much!

alvas
  • 115,346
  • 109
  • 446
  • 738
Isura Nirmal
  • 777
  • 1
  • 9
  • 26
  • Try wordnet similarity. http://www.nltk.org/howto/wordnet.html. But this is a bad question other wise. It's almost a homework problem. – Software Mechanic Aug 26 '15 at 14:08
  • I don't understand... what's the expected output? – taesu Aug 26 '15 at 14:21
  • If you are under linux os. Consider the use of grep – Ali SAID OMAR Aug 26 '15 at 14:30
  • Can you please be more specific? Maybe provide a sample input and expected output – Alfie Aug 26 '15 at 14:57
  • @AnandJeyahar Actually this is more than a homework question. I just want this data set cleaned . Thank You for your reference. Alfie sample in put is given above and the output is in below if you could see that. – Isura Nirmal Aug 26 '15 at 15:56
  • @IsuraNirmal Given that you post the list somewhere, maybe we can write something simple to clean it up for you. – alvas Aug 26 '15 at 16:53
  • take a look at http://stackoverflow.com/questions/27234280/how-to-parse-sentences-based-on-lexical-content-phrases-with-python-nltk – alvas Aug 26 '15 at 16:54
  • I was testing this with pandas in a very simple way (no nltk). Is there any way to convert this problem in to pandas lamda function and reformat this to a new frame? – Isura Nirmal Aug 27 '15 at 13:19

0 Answers0