0

I know I can do this in python but I wondered if there is a way to do it in php.

I have split a paragraph into sentences but some of these sentences are not really sentences and I would like to 'reject' them. I guess this requires some kind of sentence recognition. I know 'Punkt' can do this in nltk but I really need to be able to do the equivalent in php. An example is

'I like to run tap water.' which should be accepted

'Ewing J. R Nat Gen 133;324;pp123-456.' which should be rejected

Thanks

Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • possible duplicate of [How to Split a Paragraph into Sentences](http://stackoverflow.com/questions/2158296/how-to-split-a-paragraph-into-sentences) – Beka Jun 27 '14 at 09:21
  • Are you sure punkt gives you this information? This is something I've been searching for a while :) I think you should be using a language model http://en.wikipedia.org/wiki/Language_model with a fixed or variable threshold. This way you can calculate probability of "I like to run tap water" and of "I blabla lalala" and see that the second is not very probable to appear in English text. I guess you'll have two separable clusters of probabilities if you run this on your examples. – Yasen Jun 27 '14 at 16:39

0 Answers0