0

I have my RegEx here for C#

(".*?"|“.*?”|“.*'|'*.")

I was also trying this pattern

("|'|“).*?("|'|”)

but it's not giving the result I want

Here's the sample paragraph
"Lorem" Ipsum is simply dummy text of the printing and typesetting industry. “Lorem Ipsum” has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only “five centuries', but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with 'desktop publishing" software like "Aldus' "PageMaker" including versions of Lorem Ipsum.

my goal here is to get all words that are within these characters

"", “”, “', '", ''

The reason why it was like that was, say a person writing an article got a typo, instead of closing the double quote with -- double-quote, the article writer closed it with single quote.

Right now, what I'm getting is this enter image description here

My expected output are

Lorem, Lorem Ipsum, five centuries, desktop publishing, Aldus, PageMaker

but not limited to those because this RegEx will run in entire article and hundreds of articles.

this line here from the sample paragarphy above probably the trickiest part
industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only “five centuries', but also

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Jayson Ragasa
  • 1,011
  • 4
  • 19
  • 33
  • Your post does not explicitly exclude nested quotes - consider editing... Or actually consider nested quotes too "like someone said 'when you use regular expression you now have to problems' which is commonly mentioned when one “asks about regular expression”"... Also check out top of the charts question about nested structure matching with regex: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Alexei Levenkov Nov 24 '14 at 02:10

1 Answers1

1
(?:"|'(?!s\b|\s)|“)[^"'“”]+(?:"|'(?!s\b)|”)

Try this.See demo.

http://regex101.com/r/yP3iB0/13

vks
  • 67,027
  • 10
  • 91
  • 124
  • it was working but I got some error. Add this as a new paragraph (Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of “type and scrambled” it to make a type "specimen book") – Jayson Ragasa Nov 24 '14 at 02:37
  • almost there. add this as another parag (It has survived' not only "five centuries", but also the leap into electronic typesetting, remaining essentially unchanged.) – Jayson Ragasa Nov 24 '14 at 02:54
  • minor correction - (?:"|'|“)(?!s\b|\s)[^"'“”]+(?:"|'|”)(?!s\b) – Jayson Ragasa Nov 24 '14 at 02:59
  • I wish I could reward you with points but you deserve the "Answer" – Jayson Ragasa Nov 24 '14 at 03:06