I'm trying to capture dialogue from a novel -- any text that appears within quotation marks.
My problem is that when a quotation spans paragraphs, it's traditional to have a new quotation mark begin each paragraph, even though the previous set wasn't closed. For example:
The letter was to this effect:
"My dear Lizzy,
"I wish you joy. If you love Mr. Darcy half as well as I do my dear Wickham, you must be very happy. It is a great comfort to have you so rich, and when you have nothing else to do, I hope you will think of us. I am sure Wickham would like a place at court very much, and I do not think we shall have quite money enough to live upon without some help. Any place would do, of about three or four hundred a year; but however, do not speak to Mr. Darcy about it, if you had rather not.
"Yours, etc."
The regex I've been using (JS style) is
(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'
and it doesn't account for this. I'm not sure what I can do to handle this problem and would love a tip. (And it's not important that a single quotation be a single group, just that all quotations get captured -- the letter example above could be three groups.)
It may help that in my text, each line is a paragraph, and a paragraph never contains newlines. So if the line ends with quotation marks open, and the next line begins with a quotation mark, that could work? But that's getting beyond my ability to express in regex, I'm very new to it.