1

I need to extract all indices from my latex files. But some indices may look like this

\index{*3*sqrt*uppersqrt{\hspace{-2.5pt}$\uppersqrt{\;\;\;}$(upper square root)}}

So i need to somehow count in regex number of currently opened curly brackets. I don't know how to handle such a case.

Also, if index contain / then i don't need such index.

Example:

Anything before. \index{{}{}}\index{Hi}\anothertag{something}
\index{}{}
\index{/}

expected result is

\index{{}{}}
\index{Hi}
\index{}
Yola
  • 18,496
  • 11
  • 65
  • 106
  • For the first case doesn't a greedy dot work for you? `\\index{.*}` – revo Sep 20 '16 at 15:18
  • @revo i think it can match several indices together – Yola Sep 20 '16 at 15:40
  • What part do you call an index? – revo Sep 20 '16 at 15:47
  • @revo Whole such expression, but there is may be any text after such an expression. – Yola Sep 20 '16 at 15:49
  • This is *fundamentally* something `grep` is badly suited to do: [regular expressions famously cannot count](http://stackoverflow.com/a/133684/1968). Now, `grep`’s query language isn’t entirely regular but it’s a good approximation of when (not) to use `grep` and similar tools. – Konrad Rudolph Sep 20 '16 at 15:55
  • Are you able to use `grep -P`? – revo Sep 20 '16 at 15:56
  • @revo yes, probably you mean recursion:) At least it is listed when i run `grep --help` – Yola Sep 20 '16 at 16:00
  • We need more information about trailing and leading data. What things could be before a `\index` and after its index syntax `{..}`? Is `\index` literal string or changes? – revo Sep 20 '16 at 16:16

2 Answers2

0

There is a limited number of brackets that can be opened? The regex

\\index{(?:[^{]|(?:{(?:[^{]|(?:{[^{]*}))*}))*}

Will match a max of 3 brackets deep, like: \index{{{}}{{}}}

Leonardo Xavier
  • 443
  • 3
  • 16
0

Regex:

\\index({(?(?!{|})[^\/{}]*|(?1))*})

Live demo

Explanation:

\\index             # Match `\index` literally
(                   # Start of capturing group (1)
    {                   # Match opening brace `{`
    (?                  # Start of conditional statement
        (?!{|})             # If very next immediate character is not `{` or `}`
        [^\/{}]*            # Anything except these characters
        |                   # Else
        (?1)                # Recurs capturing group (1)
    )*                  # End of conditional - repeat conditional zero or more times - greedily.
    }                   # Match closing brace `}`
)                   # End of capturing group (1)

Usage:

grep -Po "\\index({(?(?!{|})[^\/{}]*|(?1))*})" input_file.txt

Output based on input provided by OP:

\index{{}{}}
\index{Hi}
\index{}
revo
  • 47,783
  • 14
  • 74
  • 117