3

I have a long input string that contains certain field names in-bedded in it. For instance:

SELECT some-name, some-name FROM [some-table] WHERE [some-column] = 'some-value'

The actual field name may change, but it is always in the form of word-word. I need to perform a regex replace on the string so that the output will look like this:

SELECT some - name, some - name FROM [some-table] WHERE [some-column] = 'some - value'

In other words, when the field name is enclosed in square-brackets, it should be left untouched, but when it is not, spaces should be inserted on either side of the dash. There are no nested square brackets and the reserved word could be one or more in the string.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Ammar
  • 152
  • 2
  • 12
  • That is too unclear. Please explain what makes `[some-name]` special, how it can be detected. Also, do you really have to use a regex? – Wiktor Stribiżew Aug 31 '16 at 10:37
  • is that substring at the end of the string? – Wiktor Stribiżew Aug 31 '16 at 10:43
  • The string is basically user input and [some-name] is reserve word added by program. I have to add space on the sides of - if it is not in square brackets. like 3-2 will be 3 - 2 but some-name will remain unchanged as I said it is reserve word and will be detected forth. – Ammar Aug 31 '16 at 10:46
  • So, in all `[...]` substrings the hyphen should stay as is, and outside of `[...]`, you need to add spaces on both sides. Right? Do you have nested brackets to consider? I mean, `[some[text-[here]]and-here]`? – Wiktor Stribiżew Aug 31 '16 at 10:49
  • Well, another question - is the `[some-name]` known (fixed) string (since it is reserved)? Is it the only reserved word? Are there more? **Please edit the question accordingly**. – Wiktor Stribiżew Aug 31 '16 at 10:54
  • There in no nested bracket and reserved word could be one or more in the string. like `Select * From [some-table] where [some-column] = 44 - [some-value]` – Ammar Aug 31 '16 at 11:23
  • So, `SELECT 'some-name' FROM [some-name]` should become `SELECT 'some - name' FROM [some-name]`? – Steven Doggart Aug 31 '16 at 11:25
  • Yes exactly. But I am already using regex to add spaces like `Regex.Replace("\-", " - ", Regex.IgnoreCase)`. Now I am thinking to add condition on it to stop this if hyphen `-` is in square brackets `[]` – Ammar Aug 31 '16 at 11:29
  • I will post a solution shortly. You can match the `[...]` strings with `\[[^]]+]` regex and just re-insert them, and replace all other `-`s. – Wiktor Stribiżew Aug 31 '16 at 11:30
  • @Ammar I updated the question for you, now that you've clarified what it is that you are trying to achieve. If I got any of it wrong, feel free to correct it. In the future, if people are confused by your questions, it's helpful if you update the question yourself, if you can. – Steven Doggart Aug 31 '16 at 12:32

5 Answers5

3

You can do this:

Regex.Replace(input, "(?<!\[[^-\]]*)(\w+)-(\w+)(?![^-\]]*\])", "$1 - $2")

Here's an explanation of the pattern:

  • (?<!\[[^-\]]*) - This is a negative look-behind. It asserts that matches cannot be immediately preceded by text that matches the sub-pattern \[[^-\]]*. In other words, the matches we are looking for cannot be preceded by a [ character followed by any number of characters that are not a - or a ].
  • (\w+)-(\w+) - Matches one or more word-characters, then a dash, and then one or more word characters following the dash. By enclosing the sub-patterns on either side of the dash in capturing groups, we can then refer to their values as $1 and $2 in the replacement pattern.
  • (?![^-\]]*\]) - This is a negative look-ahead. Similar to the negative look-behind, it asserts that matches cannot be immediately followed by text which matches the sub pattern [^-\]]*\]. In other words, a match cannot be followed by any number of characters that are not a - or a ] and then a closing ].

See a demo.

At first glance, you might assume that you could simply assert that is must not be immediately preceded by a [ character and that it must not be immediately followed by a ] character. In other words, (?<!\[)(\w+)-(\w+)(?!\]). However, that pattern would still match the text ome-nam in the input [some-name] because the text ome-nam is not immediately preceded or followed by the brackets.

Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
  • That is highly ineffcient regex. – Wiktor Stribiżew Aug 31 '16 at 11:33
  • @WiktorStribiżew agreed. If this is something which needs to operate on very large inputs or needs to be run many times quickly, this is not the best solution, however, it does keep the solution reduced down to a single replace using a standard pattern and replacement string. If that's important, because the patterns are being stored in a configuration setting or something, then this may be a solution worth looking at. If that were a requirement, can you think of a more efficient way of doing it, Wiktor? I tried to think of a better way but couldn't. – Steven Doggart Aug 31 '16 at 11:41
  • I posted an efficient solution tailored to the current task. Using a Match Evaluator provides much better flexibility and readability to the regex and solution in general. – Wiktor Stribiżew Aug 31 '16 at 11:42
  • Right. I saw that. I was just wondering if you could think of an efficient way of doing it without a custom match evaluator. – Steven Doggart Aug 31 '16 at 11:43
  • Is there any tutorial for it as in my assigned project I will have to use it a lot. Thanks in advance. – Ammar Aug 31 '16 at 11:48
  • @Ammar I added more info. Let me know if you are still confused about any of it. It is, admittedly a bit of a confusing pattern. – Steven Doggart Aug 31 '16 at 12:16
  • I'd really like to assure you that using infinite width lookbehinds will slow down performance, and will most probably make your code harder to maintain. Use simpler regexps and let the code do the hard work. – Wiktor Stribiżew Aug 31 '16 at 13:28
2
Dim regex As Regex = New Regex("\[[^-]*-[^-]*\]")
Dim match As Match = regex.Match("A long string containing square brackets [some-name]")
If match.Success Then
    Console.WriteLine(match.Value)
End If

Or you could use Regex.IsMatch:

Return Regex.IsMatch("A long string containing square brackets [some-name]",
                     "\[[^-]*-[^-]*\]")
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • `IsMatch` is better here than `Match`. Besides, `[^-]` matches both `[` and `]`. – Wiktor Stribiżew Aug 31 '16 at 10:35
  • (I guess this also is what @WiktorStribiżew pointed out) That'll match for example `A string with square brackets [some name without a hyphen] [-]`. – SamWhan Aug 31 '16 at 10:41
  • @ClasG The OP did not mention that the string has to be non empty. – Tim Biegeleisen Aug 31 '16 at 10:43
  • That's not what I meant. Check [this](https://regex101.com/r/jV1tI4/2). It'll match the bracketed part without a hyphen as well. – SamWhan Aug 31 '16 at 10:44
  • It seems now that this answer is not at all helpful - the [spaces should be added around the hyphen if they are missing](http://stackoverflow.com/questions/39247169/how-can-i-check-it-with-regular-expression?noredirect=1#comment65830809_39247169). – Wiktor Stribiżew Aug 31 '16 at 10:52
1

Just to check if it exists, you could try

\[[^\]]+-[^\]]+\]

It matches a literal [ and then any characters, except ], up to (including) a hyphen. Then again any characters, except ], up to a literal ].

See it here at regex101.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • OP says now that the [spaces should be added around the hyphen if they are missing](http://stackoverflow.com/questions/39247169/how-can-i-check-it-with-regular-expression?noredirect=1#comment65830809_39247169). – Wiktor Stribiżew Aug 31 '16 at 10:52
  • @WiktorStribiżew Yeah, saw that comment. It wasn't mentioned at all in the question though. – SamWhan Aug 31 '16 at 11:07
  • @WiktorStribiżew Also, if you're working on this Wiktor - "I have to add space on the sides of - if it is **not in square brackets**." – SamWhan Aug 31 '16 at 11:10
  • That is right: I think the best approach is to match and capture what we need to keep, and match only what we need to replace. See [my answer](http://stackoverflow.com/a/39248609/3832970) where I am using a match evaluator to perform appropriate actions when the capture group matches or not. – Wiktor Stribiżew Aug 31 '16 at 14:15
1

You may match and capture the [...] substrings and then only match hyphens that are not surrounded with hyphens to replace them:

Dim nStr As String = "SELECT 'some-name' FROM [some-name]"
Dim nResult = Regex.Replace(nStr, "(\[.+?])|\s*-\s*", New MatchEvaluator(Function(m As Match)
                                                   If m.Groups(1).Success Then
                                                       Return m.Groups(1).Value
                                                    Else
                                                        Return " - "
                                                     End If
                                                   End Function))

enter image description here

So, what is happening is:

  • (\[[^]]+]) - matches and stores the value of [...] substring inside the Group(1) buffer (or \[.+?] can be used here to match a [, then 1 or more any characters and then ] - with RegexOptions.Singleline flag so that . could match a newline, too)
  • (?<!\s)-(?!\s) - matches any hyphen not preceded ((?<!\s)) or followed ((?!\s)) with whitespace (\s). Actually, we may even use \s*-\s* (where \s* stands for zero or more whitespaces as many as possible since * is a greedy quantifier matching zero or more occurrences of the quantified subpattern) here to remove any whitespace there is to make sure we just insert 1 space before and after -.

If Group 1 matches, then we just re-insert it (Return m.Groups(1).Value), else we insert the space-enclosed hyphen Return " - ".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Just FYI: the regex that will work with this answer can be `(\[.+?])|\s*-\s*` (perhaps, the `RegexOptions.Singleline` will be necessary for the `.` to match newlines, too). Easy to read and maintain. – Wiktor Stribiżew Aug 31 '16 at 13:30
  • Thanks @Wiktor Stribizew by replacing the regex it worked for me. Please tell me how it could be learnt? – Ammar Sep 01 '16 at 05:50
  • 1
    I updated the answer and accepted the change. I do not know your level of regex knowledge :) so that I can only suggest doing all lessons at [regexone.com](http://regexone.com/), reading through [regular-expressions.info](http://www.regular-expressions.info), [regex SO tag description](http://stackoverflow.com/tags/regex/info) (with many other links to great online resources), and the community SO post called [What does the regex mean](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean). Also, [rexegg.com](http://rexegg.com) is worth having a look at. – Wiktor Stribiżew Sep 01 '16 at 06:17
  • What is Group 1. If I want to check {} too then I have to change it as `"(\[.+?])|(\{.+?})|\s*-\s*"` and use Group 0 for check purpose. Right? – Ammar Sep 01 '16 at 07:30
  • 1
    `Group(0)` is the whole match. `(...)` are capturing groups, and they come in the order they appear in the pattern - `Group(1)` > `(\[.+?])`, `Group(2)` > `(\{.+?})`. – Wiktor Stribiżew Sep 01 '16 at 07:32
  • I want to use the same technique in formulas. Will you please tell me how can I add special characters in it as well like `(a-b)-2` as in this case spaces are not being added. As I want the result like `(a - b) - 2`. – Ammar Mar 08 '17 at 12:30
  • You mean how to replace `[` and `]` with `(` and `)`? Escape the `(` and `)` symbols. `"(\(.+?\))|\s*-\s*"` – Wiktor Stribiżew Mar 08 '17 at 12:32
  • I just want `(a-b)-2` be replaced with `(a - b) - 2` using same expression. – Ammar Mar 08 '17 at 13:28
  • Then use `Regex.Replace(s, "\s*([-+*/])\s*", " $1 ")` – Wiktor Stribiżew Mar 08 '17 at 13:30
  • Please tell me any link of quick tutorial of Regex. – Ammar Mar 08 '17 at 13:30
1

Actually I don't know the vb.net syntax but you can use regex as

/[\s\'](\w+)\-(\w+)/g

find the (\w+)-(\w+) which is followed by space or ' and replace your string with capture group 1st - 2nd

See the sample here

Shekhar Khairnar
  • 2,643
  • 3
  • 26
  • 44