0

I need to extract specific tags from a text file (source code in this case). I've tried many ways, but none of them has been successful.

For example, this is the file:

//messagebox("ñ",string(asc("ñ")))
//messagebox("Ñ",string(asc("Ñ")))
//messagebox("ñ",string(char(241)))

messagebox("Hi")

IF Trim(sle_user_id.text) = "" AND Trim(sle_password.text) = "" THEN
    MessageBox(Titulo_Msg,&
              "Sr Usuario :~r~nDebe ingresar los datos solicitados.",StopSign!,Ok!)
    sle_user_id.SetFocus()
    Return
End If

I need to extract (either on the screen or to a file) the text that is inside the parenthesis in the tag "messagebox(THIS IS WHAT A I NEED TO EXTRACT)"

The problems are:

  • For this case [**messagebox("ñ",string(asc("ñ")))**] shows a truncated word, it ends at the first closing parenthesis: **("ñ",string(asc("ñ"**

  • For this case [ **MessageBox(Titulo_Msg,& "Sr Usuario :~r~nDebe ingresar los datos solicitados.",StopSign!,Ok!)** ] shows only the match in the line, but not the complete text between the parenthesis: **MessageBox(Titulo_Msg,&**

I have tried using awk, grep, sed and bash without success.

  • 1
    Simple parsers would shine more then regexes here... is that an option? – Wrikken Aug 14 '14 at 23:10
  • Can messagebox() appear multiple times on a line, e.g. `messagebox("Hello"); messagebox("World")`? Can it be followed by something else with parens, e.g. `messagebox("Hello"); // nice :-)` – Ed Morton Aug 15 '14 at 12:50
  • Simple parser it's an option, mmm but really i don't have a clue how to do that. And, yes, it would be an option that `messagebox("Hello"); // nice :-)` appears twice on the same line, but if there is a match of this kind it would be treated like an abnormal case, ignoring it. – Carlos Wistuba G. Aug 18 '14 at 12:21

3 Answers3

0

Thanks for all your answers, it helped me so much, this command do the trick exactly just what i wanted:

awk '/[mM]essage[bB]ox\(/,/\)$/ {gsub(/.*[mM]essage[Bb]ox\(|\)$/,""); print}' file

Best regards !! Carlos

-1

You can use awk:

awk '/[mM]essage[bB]ox\(/,/\)$/ {gsub(/.*[mM]essage[Bb]ox\(|\)$/,""); print}' file

Output:

"ñ",string(asc("ñ))
"Ñ",string(asc("Ñ))
"ñ",string(char(24))
"Hi
Titulo_Msg,&
              "Sr Usuario :~r~nDebe ingresar los datos solicitados.",StopSign!,Ok

Extracts everything in between messagebox(...) or MessageBox(...) parentheses.

Note

This would break if there is either text on a line after the final closing ) or if there were any ) at the end of a line before the final closing ) when the content of MessageBox(...) spans over multiple lines.

Community
  • 1
  • 1
John B
  • 3,566
  • 1
  • 16
  • 20
  • To the person who downvoted, please leave a comment next time and/or see [this page](http://stackoverflow.com/help/privileges/vote-down). – John B Aug 16 '14 at 01:55
  • Thank you for your answer, this worked fine for just what i wanted: awk '/[mM]essage[bB]ox\(/,/\)$/ {gsub(/.*[mM]essage[Bb]ox\(|\)$/,""); print}' file – Carlos Wistuba G. Aug 18 '14 at 11:38
  • @CarlosWistubaG. You're welcome, I'm glad it worked out for you. – John B Aug 18 '14 at 12:41
-1

No. What you are trying to do requires matching nested parentheses properly and that can not be done via regex.

Unfortunately a finite state machine which a regex is, can not match nested parentheses because it is not context-free. A more detailed explanation as to why can be found here: Can regular expressions be used to match nested patterns?

Community
  • 1
  • 1
caskey
  • 12,305
  • 2
  • 26
  • 27
  • That link is for a different use case. Regex **can** be used to solve the OP's problem given the example text, and it's just wrong to say as a general rule that you can't ever successfully match nested patterns. A link answer should be a comment. – John B Aug 16 '14 at 02:06