How to detect grammatical elements in a corpus

Question

I'm working with a big corpus in RStudio and the next phase of our research includes the detection of some grammatical elements and its frequency in the texts. We want to detect the frequency of occurrence of things like the use of abstract nouns or deontic modalities which include the auxiliary verbs ‘must’, ‘have to’, ‘may’, ‘can’, ‘should’, ‘ought to ’, etc. I would like to capture its possible conjugation, i.e., not only 'she have to' but 'she had to'; not only 'he can' but 'he could'. I guess it could be done using some simple RegEx such as

She ha(ve|d) to

He c(an|ould)

...right? The problem is 1) I'm not sure whether this can be done (I guess it can be) and 2) which library should I use to do that.

I have thought I could make a dictionary and run it to the whole corpus but 1) and 2) are still here.

This question doesn't have a whole lot of code-related detail that would make it easier to help. I have a limited experience in text mining, but wouldn't this be covered by the stemming dictionaries in whatever text mining package you're using? — camille, Feb 11 '19 at 18:45
Yeah you're right. I could paste a lot of code I used for the whole project but this is the last phase of it so I don't have any code for this task I want to accomplish, just some ideas about how could I do it (like making a dictionary). But the problems are the same: 1) and 2) :P. I'm going to look at your suggestion. Thanks! — Descartes, Feb 11 '19 at 18:53
You can pull out / make up a small sample of example data with some sense of what you're doing to make this [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — camille, Feb 11 '19 at 19:12

How to detect grammatical elements in a corpus

0 Answers0