3

It's so late, I can't quite get it.

My text looks like this:

 This is a [;;Text] and I want to match [center]everything without ;;[/center]

I use this to transform to HTML:

 return preg_replace('/\[(.+)\]/U', '<$1>', $text);

And I thought the pattern /\[[^;{2}](.+)\]/U should do the trick but it does not work.

phpcrack
  • 63
  • 5

4 Answers4

1

You can for example match [ and an optional / followed by word characters till the closing ]

Note that you don't need the /U flag in this case to make the quantifiers lazy.

\[/?\w+]

$text = ' This is a [;;Text] and I want to match [center]everything without ;;[/center]';
$result = preg_replace('/\[(\/?\w+)]/', "<$1>", $text);
echo $result;

Output

This is a [;;Text] and I want to match <center>everything without ;;</center>

Regex demo


For a more specific match, you can exclude matching ;; after the [ and match all characters except [ and ] using a negated character class.

\[(?!;;)([^][]*)]

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Very nice usage of negative lookahead. Was trying to get that one worked out with a negative lookbehind but couldn't get it to work. Thanks. This one is nice because it also captures entries like `[center;]` – Jamin Mar 14 '21 at 12:32
  • 1
    \[(?!;;)([^][]*)] this one works so perfectly! Thanks! – phpcrack Mar 14 '21 at 15:53
  • I see you reopened the question, and I wonder if https://stackoverflow.com/questions/66573222 is not a duplicate either? – Ryszard Czech Mar 17 '21 at 22:41
  • @RyszardCzech I reopened it because IMHO I think SO is a place to help programmers with the code they tried and failed. A lot of regex questions might have a duplicate answer which relates to the technique used, but in this case I changed the negated character class to a lookahead, and changed the quantifier to a negated character class to update the pattern for what I think is a better pattern to use. If you really think that is it not a dupe, you can reopen https://stackoverflow.com/questions/66573222/avoid-matches-with-given-text-in-fixed-position – The fourth bird Mar 17 '21 at 22:48
0

I found the following pattern: \[(?<!\;\;)([\w\/]+?)\]

A link to the demonstration: https://regex101.com/r/Htd32j/1

Please be aware that it only matches in that specific case, if you want any tags with spaces or other special characters it won't work and might need some modification.

Yaron
  • 1,199
  • 1
  • 15
  • 35
  • 1
    This assertion will always be true `\[(?<!\;\;)` as you first match `[` and then assert that directly to the right is not `;;` – The fourth bird Mar 14 '21 at 12:14
  • I'm only matching the tags in brackets and that's what appears in regex101, I assumed we're talking about tags only. – Yaron Mar 15 '21 at 13:21
  • 1
    Yes, but you can omit the `(?<!\;\;)` as it has no effect. You can also omit the questionmark after `+?` to make it non greedy. The `\w` can not match `]` so you can omit it. – The fourth bird Mar 15 '21 at 13:25
  • 1
    I wanted to be as specific as possible but you're absolutely right, does that solve your issue? – Yaron Mar 15 '21 at 15:55
  • There is no issue for me :-) I only added it as a minor note that the pattern can be simplified. – The fourth bird Mar 15 '21 at 16:04
0

If you want to extract only text inside [] by skipping occurances of ; you can try the below. I just modified @Mehdi regex.

(?<=\[;;)?\w+(?=\])

Check the expression in https://regex101.com/ for php.

Check the below link.

https://regex101.com/r/bFyW8S/1

Sambit
  • 7,625
  • 7
  • 34
  • 65
0

The regex I came up with to match this pattern is pretty straight-forward:

[[][A-Za-z0-9]*[]]

To break it down into smaller parts for understanding:

[[] begins with "["

[A-Za-z0-9]* contains one or more alphabet or numeric character
          
[]] ends with "]"

This will match [center] but not [/center] or [;;Text] because they have special characters and the regex is looking for only alphabet and numeric.


Edit: if you want to match every character except ";" then you can use this:

[[][^;]*[]]

Which follows very similar logic:

[[] begins with "["

[^;]* one or more character that is not ";"

[]] ends with "]"

Which will match [center] and [/center] but not [;;Text]

Jamin
  • 1,362
  • 8
  • 22
  • 1
    Dit you test this construct `[Aa-Zz]` ? This `[A-z]` is also not the same as `[A-Za-z]` Using a construct like `([Aa-Zz]|[0-9])` is also not very efficient with the alternation. You can write it as `[A-Za-z0-9]*` – The fourth bird Mar 14 '21 at 12:08
  • The test I did on regex101 for php seemed to not like the [Aa-Zz], but was ok with [A-z]. I do not doubt you are right about that one. – Jamin Mar 14 '21 at 12:18
  • `[A-z]` is a valid range, but see this answer what characters is will match https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret – The fourth bird Mar 14 '21 at 12:20
  • Thank you for clarifying this, making changes to the answer with the suggestion you provided. – Jamin Mar 14 '21 at 12:24
  • 1
    For the first pattern, you can match an optional forward slash to match both, see https://regex101.com/r/qSXluT/1. For the second pattern, you are on the right track, but see what happens if the string in between does not contain a `;` https://regex101.com/r/PaZWml/1 it would over match it. What you can do is also exclude the square brackets in that case if you don't want to allow a semi colon inside the square brackets as well, see https://regex101.com/r/MQsLBy/1 – The fourth bird Mar 14 '21 at 12:37