Regex for checking the same word in different cases

Question

I am looking to get some code written for a (hopefully) simple project. What I have is a plain text document. Lets say it contains two sentences:

Coding is fun. I enjoy coding.

What I want is a way to read the file and look at he words Coding and coding as being the same. So, basically read the words and say there are two instances of the word coding regardless of the case used. Is this possible? All I know is Regex from my python days but I am learning MEAN stack so anything Javascript/NodeJS would be great.

I'm not asking someone to write the code, I just need some guidance in what to look for or if there are better ways to do this in JavaScript.

The return value in the example I give would ideally be 2. I just need it to count the instances.

No need for regex. Just read the whole file and convert it to the same case. Why the Python tag if you expect answers in JS? — DeepSpace, Aug 07 '16 at 19:04
You might find the answer [here](http://stackoverflow.com/questions/3939715/case-insensitive-regex-in-javascript)! — csabinho, Aug 07 '16 at 19:05

score 1 · Accepted Answer · edited May 23 '17 at 11:51

You can do this in plain JavaScript with a regular expression checking for the word counting. You can see the i and g at the end of the pattern. i stands for ignore-case and g stands for global, which means, it doesn't stop looking, if it found one instance, but returns all of the found instances.

If the sentence does not match the pattern, the script will result in an error, because of the null return value of an unmatched pattern. The || [] checks, if the previous expression is null and is only executed if it is so. This way, it won't throw an error in an unmatched situation but rather return 0.

EDIT: As mentioned in the comments, coding can be part of another word like decoding. To prevent a false match, you can also match the word boundaries (\b). I added those to the code.

var sentence = "Coding is fun. I enjoy coding.";
var count = (sentence.match(/\bcoding\b/ig) || []).length;
console.log(count);

Credit goes to: https://stackoverflow.com/a/4009768/3233827

score 0 · Answer 2 · answered Aug 07 '16 at 21:02

Here's a Python solution:

import re

string = """
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. 
At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. 
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. 
At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
"""

words = {}

rx = re.compile(r'\b\w+\b')

for match in rx.finditer(string):
    word = match.group(0).lower()
    if word not in words.keys():
        words[word] = 1
    else:
        words[word] += 1

print(words)

A "word" is defined as \b\w+\b, that is word characters surrounded by word boundaries. It outputs a dict with the counted words, see a demo on ideone.com.

Regex for checking the same word in different cases

2 Answers2