2

I have this code in C but I only know how to extract string with regular expression that not inside comment code:

1. /*  * "path_build()" function in "home.c" for more information.  
2. * this is an example basic"
3. */
4.
5. /*** Free ***/ 
6. VALOR = string_make(format("%sxtra", libpath)); 
7. event_signal_string(EVENT_INITSTATUS, "Inicializando...");

should only return:

"%sxtra" 
"Inicializando..."

I try:

".*"

but its don't work, it show me all text inside "", including the strings that inside /*...*/

I use EditPag Pro, RegExp panel. It's a game translation project, I take the string of every C file and I translate to Spanish. I can't remove the comments of the original file.

The only thing I have clear is that this is the regex to find comments in C, maybe that will help the solution:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)

Any help?

Edit: I put number of lines.

Hernaldo Gonzalez
  • 1,977
  • 1
  • 21
  • 32
  • 2
    Are u sure you want regex to do all these? Instead, consider 1: remove comments. 2: use regex. – herohuyongtao Apr 29 '14 at 15:31
  • 1
    1. remove comments. 2. extract the `"blablabla"`. – gongzhitaao Apr 29 '14 at 15:31
  • 1
    It should also be noted that `".*"` is not a good way to get all text inside strings, even after you remove comments. If you have something like `"blah", variable_name, "more blah"` this would return everything from the first starting `"` to the second ending `"` – Khaelex Apr 29 '14 at 16:01
  • 2
    For starters, a much better regex to match a C multi-line comment block is: `/\*[^*]*\*+(?:[^*/][^*]*\*+)*/` (taken from [Mastering Regular Expressions (3rd Edition)](http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 "By Jeffrey Friedl. Best book on Regex - ever!")) (See: [Improving/Fixing a Regex for C style block comments](http://stackoverflow.com/a/3945705/433790) – ridgerunner Apr 29 '14 at 16:41

2 Answers2

3

Hernaldo, this is an interesting question.

Here are two versions because I am not sure if you want to capture the "inside of the string" or "the whole string"

The regexps below capture the strings to capture Group 1. You completely ignore the overall match (Group 0) and just focus on Group 1. To retrieve the strings, just iterate over Group 1 matches in your language (discarding empty strings if any).

Version 1: "The inside of the string"

(?s)/\*.*?\*/|"([^"]+)"

This will capture %sxtra and Inicializando... to Group 1.

Version 2: "The whole string"

(?s)/\*.*?\*/|("[^"]+")

This will capture "%sxtra" and "Inicializando..." to Group 1.

Please let me know if you have any questions!

Note: I did not handle /* nested /* comments */ */ as that was not specified in the question. That would require a bit of tweaking and probably a regex engine supporting recursion.

zx81
  • 41,100
  • 9
  • 89
  • 105
  • Hello zx81, to test your two options I delivered these lines: 123, 5, 6 and 7, which would be what I want. The idea is that a single regex only give me line 6 and 7 or the strings of lines 6 and 7. The idea is that it fails any of the lines 1,2,3, or 5. – Hernaldo Gonzalez Apr 29 '14 at 22:03
  • @HernaldoGonzalez That's right, the regex I gave you only returns the Group 1 strings I told you (the ones you want). Some online testers may not like this, but you can test it out for instance in [RegexBuddy](http://yu8.us/rbdemo) Is there a problem? – zx81 Apr 29 '14 at 22:24
  • @HernaldoGonzalez I am confused, are you saying that my answer is not working for you? If so, can you explain the problem? In my tests, it works perfectly. – zx81 Apr 30 '14 at 16:59
  • Nop, not working for me. I test both of regex in Search Panel of EditPad Pro that have a RegeExp feature, I know that uses the same RegexBuddy patterns, and it's mark to me every line string of all code, it include string inside comment y outher comment (so is very slow for search and determinate if the line is a true line to be translate), and not only the lines where are string of code how I hope (string no inside comments). I can't install RegexBuddy for copy and paste every code C I have. The game has 60 or more files. My idea are use the same search panel with the RegExp feature. – Hernaldo Gonzalez May 03 '14 at 00:21
  • @HernaldoGonzalez The reason it doesn't seem to work in EditPadPro (which I also use and love, by the way) is that as mentioned, the solution I gave you captures the string to Group 1. What you are looking for is not the match but the capture, which you can reference as `$1` in the EPP replace field. But I thought you were using a programming language such as C and could test if the Group 1 capture was not empty. – zx81 May 03 '14 at 06:04
0

The final solution for EditPad 6/7 is:

(?<!^[ \t]*/?[*#][^"\n]*")(?<=^[^"\n]*")[^"]+

Link: Regular expression for a string that does not start with a /*

Community
  • 1
  • 1
Hernaldo Gonzalez
  • 1,977
  • 1
  • 21
  • 32