1

I tried this on Regex101 but I couldn't figure this out. I have a definition file that contains many definitions (sample output below). I'm trying to find only definitions where the datatype is equal to 4 and the maxlength is 0.

I'm close but my regex will match too much.

Here is what I have:

/(datatype\s+: 4[\s\S]*?(maxlength\s+:\s0))/g

This does match the cases that I want but also will match the case where datatype is 4 but maxlength is not 0 until it finds the next occurrence of manxlength = 0.

Sample data (sorry, it's long):

field {
   id             : 536870914
   name           : Set Field (Submit Mode)
   datatype       : 4
   fieldtype      : 1
   create-mode    : 2
   option         : 2
   timestamp      : 1489159658
   owner          : John Smith
   last-changed   : John Smith
   length-units   : 0
   maxlength      : 0
   clob-store-opt : 0
   menu-style     : 1
   qbe-match-op   : 1
   fulltext-optns : 0
   permission     : 12\1
}
field {
   id             : 536870915
   name           : Schema Name
   datatype       : 4
   fieldtype      : 1
   create-mode    : 2
   option         : 1
   timestamp      : 1165057260
   owner          : John Smith
   last-changed   : John Smith
   length-units   : 0
   maxlength      : 30
   clob-store-opt : 0
   menu-style     : 1
   qbe-match-op   : 1
   fulltext-optns : 0
   permission     : 12\1
}
field {
   id             : 536870916
   name           : Type
   datatype       : 4
   fieldtype      : 1
   create-mode    : 2
   option         : 1
   timestamp      : 1165057260
   owner          : John Smith
   last-changed   : John Smith
   length-units   : 0
   maxlength      : 30
   clob-store-opt : 0
   menu-style     : 2
   qbe-match-op   : 1
   fulltext-optns : 0
   permission     : 12\1
}
field {
   id             : 536870917
   name           : Set Field (Query Mode)
   datatype       : 4
   fieldtype      : 1
   create-mode    : 2
   option         : 2
   timestamp      : 1489159658
   owner          : John Smith
   last-changed   : John Smith
   length-units   : 0
   maxlength      : 0
   clob-store-opt : 0
   menu-style     : 1
   qbe-match-op   : 1
   fulltext-optns : 0
   permission     : 12\1
}

Also note that this is a very limited sample and there could be hundreds of these fields with different values.

John the Ripper
  • 2,389
  • 4
  • 35
  • 61

1 Answers1

1

You need to temper the [\s\S]*? with a negative lookahead so as to build a tempered greedy token:

/(datatype\s+: 4\b(?:(?!datatype\s+: \d)[\s\S])*?(maxlength\s+:\s0\b))/g
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo

The (?:(?!datatype\s+: \d)[\s\S])*? matches any char ([\s\S]), zero or more repetitions, as few as possible, that is not a starting point of the datatype\s+: \d (a datatype substring, \s+ - one or more whitespaces, :, a space and a digit (\d).

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563