3

I have a string of words:

word dark king glow we end hello bye low wing

I need to find words where last letter of first word matches first letter of following word (example: worD Dark).

I wrote a regex expression:

\b\w*(\w)\W\1\w*\b

Currently it successfully finds 2 words in a row (Regex.Matches[0].Value = "word dark" ; Regex.Matches[1].Value = "king glow" etc.)

I need a regex expression which would read it as a pattern (Regex.Matches[0].Value = "word dark king glow we end" ; Regex.Matches[1].Value = "low wing").

How should I approach this?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

3 Answers3

3

For the record here a very expressive non regex version. I does't require picture ;)

static IEnumerable<(string W1, string W2)> GetPairs1(string input)
{
    var words = input.Split(' ', StringSplitOptions.RemoveEmptyEntries);

    if (!words.Any()) yield break;

    for( int i = 1; i < words.Length; i++) 
        if(words[i][0] == words[i-1][words[i-1].Length-1]) 
            yield return (words[i-1], words[i]);
}

Test

public static async Task Main()
{
    var input = "word dark king glow we end hello bye low wing";

    foreach (var p in GetPairs1(input)) 
        Console.WriteLine($"{p.W1} {p.W2}");
}

Output

word dark
dark king
king glow
glow we
we end
low wing
tymtam
  • 31,798
  • 8
  • 86
  • 126
2

I would also capture the last word character, check inside a lookahead if it matches the first character of the next word, put all into a group for repetition and if the condition succeeded, match following word.

(?i)(?:\b\w*(\w) +(?=\1))+\w+

See this demo at regex101

Used with caseless flag (?i) for captured a matching A in the following word.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Still can't wrap my head around how it works. But hey, on the other hand, at least it works ^^ – Gytis Dokšas Nov 16 '19 at 22:45
  • 2
    @GytisDokšas The lookahead is not consuming anything. Just think of the pattern without the `(?= +\1)` it would basically just be `(?:\b\w+ )+\w+`. But we capture the last letter in the first word and only continue if the condition succeeds that the first letter of the next word matches. – bobble bubble Nov 16 '19 at 22:48
  • Additional question: how could I implement punctuation marks between words (e.g. `word, day` or `word!day` (with/or without whitespaces)) – Gytis Dokšas Nov 16 '19 at 22:51
  • 2
    @GytisDokšas As you did with `\W+` instead of the space ([demo](https://regex101.com/r/kPmCCV/5)) or anything else you consider *non word characters*. – bobble bubble Nov 16 '19 at 22:56
  • Tried with \W+ myself, didn't work ar first, retried after your suggestion and now it does work, dang, Regex is some voodoo magic ^^ – Gytis Dokšas Nov 16 '19 at 23:21
1

Good question and there is good answer here.

With Positive Lookahead

I guess,

(?is)\w*(\w)(?= (\1)\w*)

might be somewhat closer, there might be edge cases though, for which you'd probably want to look into the positive lookahead here in this block:

(?= (\1)\w*)

RegEx Demo 1


With Positive Lookbehind

You can also lookbehind, and capture things, if/as you wish and code, maybe with some expression similar to:

(?is)(?<=([a-z])\s)(\1)([a-z]*)

RegEx Demo 2

Test

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(?is)\w*(\w)(?= (\1)\w*)";
        string input = @"word dark king glow we end hello bye low wing
word Dark King Glow We End hello bye LoW wing";

        foreach (Match m in Regex.Matches(input, pattern))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


enter image description here


Complexity

Lookarounds in general are not really complexity-friendly methods, yet I can't think of a better way now.

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    It's closer but still not quite what I'm looking for. Let's say I need to find the longest (as in word count) combination using this pattern in a bigger text. With some more functions it would be possible to make it with your expression but it would be much simpler if there was a way to get it as whole values (currently it creates many `Regex.Matches` and increases complexity) – Gytis Dokšas Nov 16 '19 at 22:34