I am having trouble trying to write a search engine that treats all inflections of a word as the same basic word.
- So for verbs these are all the same root word, be:
- number/person (e.g. am; is; are)
- tense/mood like past or future tense (e.g. was; were; will be)
- past participles (e.g. has been; had been)
- present participles and gerunds (e.g. is being; wasn't being funny; being early is less important than being correct)
subjunctives (e.g. might be; critical that something be finished; I wish it were)
- Then for nouns, both the singular form and the plural form should count as the same basic word [ᴇᴅɪᴛᴏʀ's ɴᴏᴛᴇ: this is frequently referrred to as the citation form of the word.]
For example, with “enable”, I don’t want “enables” and “enabled” printed as separate entries. All three of those should count as the same basic word, the verb enable.
I can prevent printing of duplicates using a hash like:
unless ($seenmatches{ $headmatches[$l] }++)
Could someone explain this? Explained in comments below.
This doesn’t stop the plural/past from continuing on. Is there a way to do this, or some wholely distinct approach, perhaps one involving a regex and/or substitution, then an unsub later?
I can't modify the word with a substitution, because then the print would not print out right. Although I'm not at the stage yet, eventually I'd like to include irregular past tenses [ᴇᴅɪᴛᴏʀ's ɴᴏᴛᴇ: and irregular nouns, too?] as well
Im not sure what else you need to answer my question, so please just let me know anything I’ve unintentionally left out, and I'll fill in any missing bits to help make it clearer.