I have my RegEx here for C#
(".*?"|“.*?”|“.*'|'*.")
I was also trying this pattern
("|'|“).*?("|'|”)
but it's not giving the result I want
Here's the sample paragraph
"Lorem" Ipsum is simply dummy text of the printing and typesetting industry. “Lorem Ipsum” has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only “five centuries', but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with 'desktop publishing" software like "Aldus' "PageMaker" including versions of Lorem Ipsum.
my goal here is to get all words that are within these characters
"", “”, “', '", ''
The reason why it was like that was, say a person writing an article got a typo, instead of closing the double quote with -- double-quote, the article writer closed it with single quote.
Right now, what I'm getting is this
My expected output are
Lorem, Lorem Ipsum, five centuries, desktop publishing, Aldus, PageMaker
but not limited to those because this RegEx will run in entire article and hundreds of articles.
this line here from the sample paragarphy above probably the trickiest part
industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only “five centuries', but also