I'am a regex beginner and need your help with finding the right regex for my project in Notepad++. My aim is to have a regex to find & extract some strings in single quotes which were extracted from a HTML document. I need one regex to do it all and I am bound to use Notepad++.
Here's the structure of my text document (cannot use the original since it contains confidential material):
{ group: '1', code: '1111', ignored: true, shortDescription: 'This is a short "description", containing commas or quotes', description: '', document: 'documentname.txt', row: '1', original: 'this is the original text', translated: 'this is the translated text', matchRate: {label: "label", value: "value"} } _LF_
{ group: '2', code: '2222', ignored: true, shortDescription: 'This is another short "description", containing commas or quotes', description: '', document: 'documentname.txt', row: '1', original: 'this is the original text', translated: 'this is the translated text', matchRate: {label: "label", value: "value"} } _LF_
{ group: '3', code: '3333', ignored: true, shortDescription: 'This is yet another short "description", containing commas or quotes', description: '', document: 'documentname.txt', row: '1', original: 'this is the original text', translated: 'this is the translated text', matchRate: {label: "label", value: "value"} }
My documents contains 33 rows, all looking like this ("LF" in the end is a line break). "group", "code" and so on are always the same, the string in single quotes differs and also might be empty.
I need to extract all values in '' (or delete all the rest), separated by a comma (or similar) in order to put them in an excel document. I also need the line breaks, too.
Here's what I already did: I am able to find all strings in single quotes with
([^']*+'[^\r\n']*+)
although this way, also the text which comes after the ending single quote until the next beginning single quote is shown as output.
What I still need is a possibility to erase all other text, including the single quotes around these strings. I wasn't able to manage that. Here is what the result should look like:
'1', '1111', 'This is a short "description", containing commas or quotes' '', 'documentname.txt', '1', 'this is the original text', 'this is the translated text'
'2', '2222', 'This is another short "description", containing commas or quotes' '', 'documentname.txt', '1', 'this is the original text', 'this is the translated text'
'3', '3333', 'This is yet another short "description", containing commas or quotes' '', 'documentname.txt', '1', 'this is the original text', 'this is the translated text'
I also read some threads on regex like this or this, and I learned a lot (as I said, beginner speaking here...), but I didn't manage to find a solution to extract exactly the strings I need.
I would be very happy if someone could help me. Thanks a lot!