4

What is the regular expression to find words that are repeated on the same line?

I've tried some expressions that I found on Stack Overflow, such as this, but none is working correctly.

The result I want to achieve:

Enter image description here

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
PoseLab
  • 1,841
  • 1
  • 16
  • 22
  • 7
    Some example input and output would help. What is a "word"? What is not working with your previous attempts? Which programming language/tool/environment (i.e. which regex flavor) are you using? – Martin Ender May 02 '13 at 08:40
  • What is a "word"? Any word. What is not working with your previous attempts? You have an example in the link of my question. Which programming language/tool/environment (i.e. which regex flavor) are you using? Any text editor i.e. Sublimetext, notepad++,... – PoseLab May 02 '13 at 09:33
  • 1
    I don't see what's not working in the linked question. "Any word" isn't really helping. Only letters? Or the regex definition of word? (letters, digits, underscores). Is `don't` a word? Just because you linked a question that provided input/output examples doesn't make your own question more complete. It would really help if we had some of your actual example input. Also "words that are repeated on the same line" - do have to be consecutive (as in the linked question)? Or do you want to find `foo` in `foo bar foo`? To me that's repeated on the same line. – Martin Ender May 02 '13 at 10:38
  • There are variations in regular expression (engines). The example is in [Perl](https://en.wikipedia.org/wiki/Perl). What is the target environment? Perl? [JavaScript](https://en.wikipedia.org/wiki/JavaScript)? Something else? – Peter Mortensen Feb 08 '22 at 13:10

4 Answers4

20

This regex will do to find which words you want to highlight. (The example is in JavaScript, and it is easy to test in the browser's JavaScript console.)

s = "It's a foo and a bar and a bar and a foo too.";
a = s.match(/\b(\w+)\b(?=.*\b\1\b)/g);

This returns an array of words, possibly multiple times for the same word.

Next you can do this:

re = new RegExp('\\b(' + a.join('|') + ')\\b', 'g');

And that should suffice to highlight all occurrences:

out = s.replace(re, function(m) { return '<b>' + m + '</b>' });
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
bart
  • 7,640
  • 3
  • 33
  • 40
1

If you want to find multiple words right after each other, for example,

Sam went went to to to his business

you can use this regex:

s = "Sam went went to to to his business";
a = s.match(/\b(\w+)(\s\1)+\b/g);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • What programming language? [JavaScript](https://en.wikipedia.org/wiki/JavaScript)? Or something else? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/61171583/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Feb 08 '22 at 13:16
0

In the absence of a sample string, lets use a test case and a few examples of how to can achieve this.

String

My name is James and James is my name

Regex

^(James)$

Group 1 (0 is generally the full match string and will likely not have a capture count) is captured twice. This means that the word is repeated. Some logic is required in the the tool you are using to execute your regex in order to decide how if you are interested in the 'word'.

Using the same string, consider this regex

(?<=James.*)(James)

This will detect the word James ONLY if it is proceeded by 'James' followed by any character. Depending on your engine, the '.' (period) should match any character that is not a newline by default. This confines the search to a single line.

Note the limitation of having to specify the word exactly. I am not sure how to get around this.

EDIT Try this, it's a doozy..

(?<=^|\s+\1\s+.*)\s+(\w+)

Using positive lookbehind (as in example 2) we detect 'whole words' that match our current group. A whole word is defined as:

  • Our current word
  • Proceeded by at least 1 space character or at the start of a line
  • Followed by at least 1 space

Further, the match we are on must be a standalone word (preceeded by at least one space character).

As far as results are concerned, each match will be a repeated word.

Gusdor
  • 14,001
  • 2
  • 52
  • 64
-1

You can use this regex to find consecutive words, next to each other.

For example: "My name is Prince Prince, and I love cats." The regex below will find Prince Prince. It is the simplest version.

(\w+)(\s\1)+

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Just Me
  • 864
  • 2
  • 18
  • 28
  • 2
    Please explain what this will do. It helps make your answer more valuable to the community. – JustCarty Jun 30 '21 at 21:09
  • I appreciate the edit; but this still doesn't really explain what the RegEx doing. Not to mention it doesn't actually answer the question. This RegEx fails on the following sentence: `My name is Prince Prince, and I love Prince cats.` which should match `Prince` three times - the question asks for same line, not consecutive. – JustCarty Jul 02 '21 at 12:09