1

I am trying to learn Regex and I have scenario where I thought I can use the same. I have a set of strings in the below format(as shown in the table) from which I need to extract each substring around joining operator "and", "or", "not". For eg:- "some column name1 = some value1" as one such substring from first string.

After that I need to extract left hand side string and right hand side string of operators "like", "=", "<", ">". In the above example it would give "some column name1" as one substring and "some value1" as another substring along with operator as "=".

  • some column name1 = some value1 and some column name2 < another value2 or some column name 3 > value3 not column name 4 = value4 and name5 = value5
  • columnA = 324324
  • columnB like a text
  • value text

Since I am new to Regex, this is what I have tried till now but it doesn't seem to give me all the values around these operators. Once this works, I am thinking I can apply similar regex with operators as "like", "=", "<", ">" on the resulting substrings to get final output.

(.*?)\b(and|or|not)

When I try the above regex on the first example, this part "name5 = value5" is missing after matching.

(.+?)(and|or|not)(.+)

When I try this one, it matches the first substring but rest of them are matched as a single substring instead of splitting those again.

Please note that I was able to use split operation and give "and|or|not" as separator to get array of substrings however I am trying to see if I can directly get these matched substrings from the given string just for learning regex(This answer says it is possible to do using Regex). I have explored stackoverflow for similar questions but none of the solutions worked in my case. The language in my case is Objective C/Swift.

adev
  • 2,052
  • 1
  • 21
  • 33
  • Why not add an end of string anchor as an alternative? [`(.*?)(?:\b(and|or|not)\b|$)`](https://regex101.com/r/TIVCAS/1). – Wiktor Stribiżew Jul 22 '17 at 10:45
  • @rmaddy, I included language tag since on similar questions people ask for language before giving answer. So I thought regex has some dependency on that. – adev Jul 22 '17 at 17:25

1 Answers1

1

You may add an end of string anchor $ as an alternative to the delimiters.

(.*?)(?:\b(and|or|not)\b|$)
                        ^^     

See the regex demo.

If your string contains line breaks, you must make . match them by adding (?s), a DOTALL modifier, at the pattern start.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you. That is working fine. Just for my understanding if I need to include line breaks as separator similar to and/or can I use this regex? https://regex101.com/r/jsufra/1. `(?s)(.*?)(?:\b(and|or|not|\n|\r)\b|$)` . Is there any optimization we can do on that? I am not sure how to include linebreaks in and/or group. – adev Jul 22 '17 at 17:23
  • Just a followup question, as mentioned in my question if I need to further split these substrings to "key", "=", "value" format after splitting based on and/or, is there someway to combine both regex together in one single regex? something like regex will first split based on "and/or" then it will again split based on "=/>/like" using one single regex? Right now I can do this in two steps using two different regex but was wondering how to achieve that in one step. – adev Jul 22 '17 at 17:36
  • I just tried and this is what I got close to what I meant https://regex101.com/r/kTP0Qi/1 . `(?s)(.*?)(?:\b(like| = | < | > |and|or|not|\n)\b|$)`. It doesn't work without giving spaces before and after =/>. I am not able to understand why. Sorry for asking too many followup questions. – adev Jul 22 '17 at 17:55
  • @adev: The sample text supplied in OP is *4 different strings* or *a 4-line text string*? Please provide exact Obj-C/Swift code you are using and the input strings as string literals, and state the expected output. Do not trust Regex101 too much, you need to be able to use the pattern in the target environment. And BTW, the regex you might seek is [`(?s)(.*?)(?:\b(?:like|and|or|not)\b|[=<>\n\r]|$)`](https://regex101.com/r/kTP0Qi/2), but it must be tested in the real environment. – Wiktor Stribiżew Jul 22 '17 at 18:36
  • 1
    Thanks @Wiktor, I will do that. For my project, I did it by using easier way of matching (and|or|not) and splitting by using regex match function. Then I applied similar logic on resulting substrings with operators. I tried finding out this route since my colleague was saying it can be matched without splitting. So thought of learning that. Thanks for your help. My actual case has separate strings and no line breaks this was only for my knowledge. – adev Jul 23 '17 at 03:10