I am fairly new to spacy / textacy and I have a complicated task ahead. Your help is much appreciated.
In a nutshell, from a sentence like "Did assault paramedic by kicking and pushing him", I want to establish whether the reported abuse was against a police officer or other worker (ambulance, hospital staff, traffic warden, etc).
The challenges are: - The language in which the officers write is not standard English, also the sentences have many punctuation and other errors. - Subject is often omitted from the reports so using 'textacy.extract.subject_verb_object_triples' for example does not work as it cannot find a subject. (also subject is not necessary here as we already know that the individual has been charged with the abuse, we only want to know what category worker they assaulted from the text provided) - The text can comprise of a number of sentences that give other context to the crime or it might list a number of abuse charges to multiple types of workers in one text.
Examples: 1. "Did shout, swear and threaten her neighbours, assault A Police Officer." 2. "Did get ejected from a liecenced premises thereafter act aggressively towards his wife and push her.Did act in an aggressive threatening manner towards door staff and other persons.Did resist arrest.Did assault Police by biting and kicking." 3. "Accused did punch PC Smith then in the execution of his duty by throwing a punch towards his face to his non injury." 4. "Did throw a mobile phone at witness constable Smith"
What I am expecting to get is something like VERB,OBJECT (punch, PC Smith) which would then need to be learned to mean yes, this is a police officer. The compound objects could be PC (Police Constable), Sgt (Sargent), etc
I tried this:
import spacy
import textacy
nlp = spacy.load('en')
text = nlp(u'Did assault paramedic by kicking and pushing him')
text_ext = textacy.extract.subject_verb_object_triples(text)
But that only works after adding a subject (which i do not need), as well as 'the' in front of the object (paramedic). So the sentence becomes "Accused did assault the paramedic by kicking and pushing him). I have 55k statements to begin with so correcting the language is not feasible.
How can I work this issue? Thanks