0

I'm trying to write a regex for the following pattern:

[MyLiteralString][0 or more characters without restriction][at least 1 digit]

I thought this should do it:

(theColumnName)[\s\S]*[\d]+

As it looks for the literal string theColumnName, followed by any number of characters (whitespace or otherwise), and then at least one digit. But this matches more than I want, as you can see here:

https://www.regex101.com/r/HBsst1/1

(EDIT) Second set of more complex data - https://www.regex101.com/r/h7PCv7/1

Using the sample data in that link, I want the regex to identify the two occurrences of theColumnName] VARCHAR(10) and nothing more.

I have 300+ sql scripts which containing create statements for every type of database object: procedures, tables, triggers, indexes, functions -- everything. Because of that, I can't be too strict with my regex.

A stored procedure's file might include text like LEFT(theColumnName, 10) which I want to identify.

A create table statement would be like theColumnName VARCHAR(12).

So it needs to be very flexible as the number(s) isn't always the same. Sometimes it's 10, sometimes it's 12, sometimes it's 51 -- all kinds of different numbers.

Basically, I'm looking for the regex equivalent of this C# code:

//Get file data
string[] lines = File.ReadAllLines(filePath);

//Let's assume the first line contains 'theColumnName'
int theColumnNameIndex = lines[0].IndexOf("theColumnName");

if (theColumnNameIndex >= 0)
{
    //Get the text proceeding 'theColumnName'
    string temp = lines[0].Remove(0, theColumnNameIndex + "theColumnNameIndex".Length;

    //Iterate over our substring
    foreach (char c in temp)
    {
        if (Char.IsDigit(c)) 
            //do a thing
    }
}
sab669
  • 3,984
  • 8
  • 38
  • 75

1 Answers1

3
(theColumnName).*?[\d]+

That'll make it stop capturing after the first number it sees.

The difference between * and *? is about greediness vs. laziness. .*\d for example would match abcd12ad4 in abcd12ad4, whereas .*?\d would have its first match as abcd1. Check out this page for more info.

Btw, if you don't want to match newlines, use a . (period) instead of [\s\S]

azizj
  • 3,428
  • 2
  • 25
  • 33