0

I need some help for extracting hard coded strings out of HTML.

This is a example markup from the template engine that I use

[[if:"x";"y"]]
    <p>true part</p>
[[:else]]
    <p>false part</p>
[[:endif]]

[[each:ARRAY;KEY;VALUE]]
    Index :[[KEY]] is :[[VALUE]]

    or if VALUE is an array
    Index :[[KEY]], FOO is :[[VALUE:FOO]]
[[:endeach]]

{$_TEMPLATE['VARS']}

<p><b>I want this</b> and this, {%'AND **THIS NOT**, THIS IS ALREADY TRANSLATED
SINGLE QUOTE MARK IS ESCAPED BY A BACKSLASH \' '}
LINES</p>

Currently I use that pattern />([^\<\>\n\{\}]+\S*?)+</is but it work not reliable.

:[[VAR]], {$_TEMPLATE['VAR']} and control-blocks([[if:"x";"y"]] etc.) should not be extracted. In case of mixed text (Foo :[[has]] bar) should Foo and bar extracted separately

For the attributes, I using the pattern /(placeholder|title|alt|value)\=\"([^\"\'=\{\}\[\]]*?)\"/ which is no Problem

I hope you can help me.

EDIT: Required output from this example:

true part
false part
Index 
is
or if VALUE is an array
Index
, FOO is
I want this
and this
Devtronic
  • 39
  • 7
  • What would you like to extract from this file? -- Supply a sample output please. – ShellFish May 09 '15 at 23:15
  • `

    this text {%'THIS NOT,. SINGLE QUOTE MARK IS ESCAPED BY A BACKSLASH \' '} this text also

    [[if:"x";"y"]]

    true text

    [[:else]]

    false text

    [[:endif]]` I want this as the output `this text` ` this text also` `true text` `false text`
    – Devtronic May 10 '15 at 05:41
  • 1
    First what you ask is very unclear. Second, if I even understand what you want, this is not a RegEx task. You might be better of just traversing each line of the file/string and just detecting simpler patterns and in multiple `tests` on simpler pattern, canceling out more complex first versus less complex later. RegEx isn't magic and shouldn't be abused for broader logic; that's when you use code-blocks (like fallback functions). –  May 10 '15 at 07:00
  • 3
    Please check if this solves your problem: `(?si)\s*(?:\{\$.*?\}|:?\[\[.*?\]\]\p{P}?|\{\%.*?\}|<.*?>)\s*(*SKIP)(*F)|\b([^:{}<>\[\]\n\r]+)\b`: https://regex101.com/r/tA4kK4/1 – Wiktor Stribiżew May 10 '15 at 07:45

0 Answers0