4
My text "can contain" both single 'and double"' quotes. The quotes "can also be 'nested" as you can see.

Expected results

(array with 3 items)

can contain
and double"
can also be 'nested

How far I've come

I'm not a regex expert, far from it. I still managed to get text between double quotes like I can "grab this" text.

preg_match_all("~\"(.*?)\"~", $text, $between);
print_r($between);

Valid / invalid

  • Valid: This is "A text" (A text)
  • Valid: This is 'A text' (A text)
  • Valid: This is "A 'text" (A 'text)
  • Valid: This is 'A "text' (A "text)
  • Invalid: This is "A text (Uneven quotes 1)
  • Invalid: This is 'A text (Uneven quotes 1)
  • Invalid: This is "A "text" (Uneven quotes 3)
  • Invalid: This is 'A 'text'(Uneven quotes 3)
  • Invalid: This "is ' A " text' (Intersecting)

Additional notes

  • If there is an error, like a non closed quote, it's fine if it breaks (This "has "one wrong" quote)
  • I would prefer a regex solution, but if there are better non regex solutions, it's fine.

My guesses

My guess is that each character needs to be looped and checked. If it starts with a ", it needs to step the characters to the next " in order to wrap it in. Then I guess it's need to be reset from that position to see what the next type of quote is and to it again until the string has ended.

Answers on Stackoverflow that does not work

This answer does not work for my problem: regex match text in either single or double quote

A proof can be seen here: https://regex101.com/r/OVdomu/65/

Jens Törnell
  • 23,180
  • 45
  • 124
  • 206
  • @NigelRen No, it did not. I updated his demo and it wraps almost all of the string so that works differently than I'm hoping for. https://regex101.com/r/OVdomu/65/ – Jens Törnell Nov 20 '19 at 08:40
  • @WiktorStribiżew I tested your regex here: https://regex101.com/r/WSaYeh/1/ I was surpriced how well it works. I tried to make it break, but could not. Do you see any pitfalls with it? – Jens Törnell Nov 20 '19 at 08:45
  • The `preg_match_all('~(?|"([^"]*)"|\'([^\']*)\')~', $txt, $matches); print_r($matches[1]);` breaks with your invalid (Uneven quotes) cases. – Wiktor Stribiżew Nov 20 '19 at 08:46
  • @WiktorStribiżew If it's invalid, it's broken anyway so that's fine for me. You can add it as an answer if you like. – Jens Törnell Nov 20 '19 at 08:47
  • How should it handle *intersecting* sections? E.g.: `...A " B ' C " D ' E...` ? – Yoshi Nov 20 '19 at 08:51
  • @Yoshi Very good point as I did not cover that case in my question. In my case, it should be seen as invalid and work like the answer by Wiktor (`B ' C` as first part then break or skip). – Jens Törnell Nov 20 '19 at 08:57
  • 1
    @JensTörnell Maybe [this answer](https://stackoverflow.com/a/58339041/5527985) is of help as well ([demo](https://regex101.com/r/fMkRzQ/3)). It also deals with escaped quotes inside. – bobble bubble Nov 20 '19 at 09:25

1 Answers1

1

You may use

if (preg_match_all('~(?|"([^"]*)"|\'([^\']*)\')~', $txt, $matches)) { 
    print_r($matches[1]);
}

See the regex demo and the PHP demo.

A variation that supports escaped quotes, too:

'~(?|"([^"\\\\]*(?:\\\\.[^"\\\\]*)*)"|\'([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\')~s'

See this regex demo.

The (?|"([^"]*)"|\'([^\']*)\') is a branch reset group matching either ", then any 0+ chars other than " and then a " or a ', then any 0+ chars other than ' and then ', while capturing into Group 1 all the contents between the matching quotation marks.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I use the first one. As far as I know, it works for all my cases and it's much shorter. The variation looks more lika a hack than a regex. But good to have alternatives, especially for others visiting this question in the future. – Jens Törnell Nov 20 '19 at 11:20