1

I'm working in FrameMaker and trying to extract definitions from a document glossary with a script. I've run into a problem with my lookahead assertion that I can't seem to sort. Glossary entries look like:

ADC........... air data computer

The problem is that each entry may have one or two tabs separating the acronym from the definition. The first tab is rendered as "......". Some glossaries have a second tab that appears as a blank space after the periods and before the definition. The following works fine for glossaries with a single tab.

(?<=\bADC\x08).*

However, if glossary uses two tabs, the regexp picks up the second tab along with the definition. If I change my look ahead to:

(?<=\bADC\x08\x08).*

It works with two tabs, but not with one. If I change it to:

(?<=\bADC\x08+).*

...which should find one or more occurrences of the tab character, I get a "Not Found" error. Apparently operators do not work the same way in assertions as they work in regexps.

Gator
  • 63
  • 4
  • If your regex engine does not support variable width lookbehind patterns (like `(?<=\bADC\t+)\S.*`), try `(?<=\bADC\t|\bADC\t\t)\S.*` or `(?:(?<=\bADC\t)|(?<=\bADC\t\t))\S.*` – Wiktor Stribiżew Aug 23 '23 at 20:53
  • Wiktor...thanks for your suggestions. The first two didn't work with the FrameMaker regex engine. It's a somewhat dated engine. The last one did, but in the case of two tabs, selects the second tab along with the definition. I got a solution in the Adobe Framemaker online forum that works after some additional scripting. Using regex capture group regex = /\w+\x08+(.+)/ig, instead of an assertion, I get just the definition regardless of the number of tabs. – Gator Aug 28 '23 at 21:22

1 Answers1

0

Since you can use a capturing group in the regex to grab just a part of the match, you can use

\bADC\x08+(.*)

Details:

  • \b - a word boundary
  • ADC - ADC string
  • \x08+ - one or more chars with 08 hex character
  • (.*) - Group 1: any zero or more chars other than line break chars as many as possible.

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563