I'm working with a big corpus in RStudio and the next phase of our research includes the detection of some grammatical elements and its frequency in the texts. We want to detect the frequency of occurrence of things like the use of abstract nouns or deontic modalities which include the auxiliary verbs ‘must’, ‘have to’, ‘may’, ‘can’, ‘should’, ‘ought to ’, etc. I would like to capture its possible conjugation, i.e., not only 'she have to' but 'she had to'; not only 'he can' but 'he could'. I guess it could be done using some simple RegEx such as
She ha(ve|d) to
He c(an|ould)
...right? The problem is 1) I'm not sure whether this can be done (I guess it can be) and 2) which library should I use to do that.
I have thought I could make a dictionary and run it to the whole corpus but 1) and 2) are still here.