4

I want to get one or more ->func(xxx,xxx) at the end of a piece of code.

Their could be like this:

any code any code ->func(xxx)

or

any code any code 
->func()

or

any code any code 
-funcA()->funcB(xxx)

or

any code any code 
->funcA()
->funcB(xxx)

or mix them:

o.start_time = obj.s;
o.repair_type -> obj.r;
o.limit -> obj.l;->god("('\"\"')") ->fox(,'->')
->egg()->dog(,'c')
->cat(,'b')-> banana(,'a"\'\(\)\'->"()')  ->  apple(,'a')

In this code, i want to :

  • plan A

    1. get substring apple(,'a')
    2. remove -> apple(,'a')
    3. get substring banana(,'a"\'\(\)\'->"()')
    4. remove -> banana(,'a"\'\(\)\'->"()')
    5. get substring cat(,'b')
    6. remove ->cat(,'b')
    7. get substring dog(,'c')
    8. remove ->dog(,'c')
    9. get egg()
    10. remove ->egg()
    11. get fox(,'->')
    12. remove ->fox(,'->')
    13. get god("('\"\"')")god("('\"\"')")
    14. remove ->god("('\"\"')")
    15. OVER
  • plan B:

    1. get and remove ->cat(,'b')-> banana(,'a"\'\(\)\'->"()') -> apple(,'a')
      1. get substring apple(,'a')
      2. remove -> apple(,'a')
      3. get substring banana(,'a"\'\(\)\'->"()')
      4. remove -> banana(,'a"\'\(\)\'->"()')
      5. get substring cat(,'b')
      6. remove ->cat(,'b')
    2. get and remove ->egg()->dog(,'c')
      1. get substring dog(,'c')
      2. remove ->dog(,'c')
      3. get egg()
      4. remove ->egg()
    3. get and remove ->god("('\"\"')") ->fox(,'->')
      1. get fox(,'->')
      2. remove ->fox(,'->')
      3. get god("('\"\"')")god("('\"\"')")
      4. remove ->god("('\"\"')")
    4. OVER

Now, I am trying planB by this two RegEx, but not good enough:

loop
    if match "\R\s*->\s*(.+)$"
        get substring and remove
        loop substring
        if match "(?:(?<=\)).)*\s*->\s*(((?!->).)*)$"
            push substring2 to arr
            remove substring2
        else
            break
    else
        break
Nakilon
  • 34,866
  • 14
  • 107
  • 142
Junkai
  • 71
  • 6
  • 1
    why is this tagged javascript? – Robert Apr 08 '16 at 06:41
  • 1
    you might have forgotten to tag `c++`, `c` and several other programming languages... – SomeJavaGuy Apr 08 '16 at 06:42
  • 1
    @KevinEsche Stackoverflow does not allow more than 5 tags otherwise we would have seen them all – rock321987 Apr 08 '16 at 06:43
  • oh, I though people can fix this problem no matter what languages he is familiar with. Am I wrong? I remove tags right now. Sorry. – Junkai Apr 08 '16 at 06:47
  • 1
    tagging programming language is important. it lets us know which regex parser we are dealing with. now to the problem: if the functions can be "nested" like `func('func(\'\')')` it may not necessarily be a regular grammar and in that case LL or LR parser will be required. – cdm Apr 08 '16 at 07:10
  • So, it is PCRE? I see `\R` construct in the regex... – Wiktor Stribiżew Apr 08 '16 at 07:13
  • @Wiktor Stribiżew yes, veteran. http://www.pcre.org/pcre.txt – Junkai Apr 08 '16 at 07:20
  • What a mess of a regex I have... Try [`(?:\s*->\s*[\w.]+(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?1))*\)))+\s*\z`](https://regex101.com/r/xI4bZ5/1). – Wiktor Stribiżew Apr 08 '16 at 07:27
  • @Wiktor Stribiżew Thank you so much, I modify some place, and now it is in line with my request: (?:(\s*->\s*([\w.]+\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?1))*\))))+\s*\z – Junkai Apr 08 '16 at 08:34
  • I cannot understand what you did without proper formatting. I suspect you made an error there by adding another capturing group and keeping the former subroutine call. Shall I post my regex? – Wiktor Stribiżew Apr 08 '16 at 08:46
  • But it seems to work fine? [`(?:(\s*->\s*([\w.]+\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?1))*\))))+\s*\z`](https://regex101.com/r/xI4bZ5/2) – Junkai Apr 08 '16 at 08:59
  • Have you considered to use a PHP parser? A regex can't do the job reliably. Regardless how complicated it get's – hek2mgl Apr 08 '16 at 12:07

2 Answers2

1

I do not think a regex is the final means to match what you need, but it can be used for a one-off task.

In PCRE, we have a recursion support, thus we can match function start and end. If the code does not have any comments, you can match these nested (...) together with single- and double-quoted string literals at the end of the string with

(?:((?(3)\s*|\R*)->\s*([\w.]*(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"‌​|[^"'()]++|(?3))*\)))))+\s*\z

See the regex demo

Explanation:

  • (?:((?(3)\s*|\R*)->\s*([\w.]*(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"‌​|[^"'()]++|(?3))*\)))))+ - 1 or more occurrences of the following sequences:
    • (?(3)\s*|\R*) - a conditional that checks if Group 3 matched, and matches 0+ whitespace characters if Group 3 is initialized, and matches 0+ linebreak sequences (with \R*) if Group 3 has not matched yet (thus, we match linebreaks only at the start)
    • ->\s* - -> followed with 0+ whitespace
    • ([\w.]*) - (Group 1, function name) 0+ alphanumeric/underscore/dot characters
    • (\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?1))*\)) - Group 1 matching
      • \( - literal opening (
      • (?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?3))* - single quoted literals ('[^'\\]*(?:\\.[^'\\]*)*') or double-quoted literals ("[^"\\]*(?:\\.[^"\\]*)*") or (...) ([^"'()]++|(?3) where (?3) recurses the whole Group 3 subpattern).
      • \) - literal closing )
  • \s*\z - 0+ whitespace \s* right before the very end of the string \z.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • but this expression only match the capturing group like "(xxx)" but lost funcion name. and this can work good: [`(?:(\s*->\s*([\w.]+\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?1))*\))))+\s*\z`](https://regex101.com/r/xI4bZ5/2) – Junkai Apr 08 '16 at 11:49
  • So, you need that `apple`? Then, you need to use [`(?:\s*->\s*([\w.]+)(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?2))*\)))+\s*\z`](https://regex101.com/r/xI4bZ5/3) – Wiktor Stribiżew Apr 08 '16 at 11:54
  • Uh! A regex is obviously not the right tool for the job. I suggest use a PHP parser instead. – hek2mgl Apr 08 '16 at 12:06
  • Not only `apple(,'a')`, but also need `-> apple(,'a')` ,because I should remove this string from the whole piece of code, so I can get `banana()` next. – Junkai Apr 08 '16 at 12:06
  • @hek2mgl But I am not writing PHP code. The text before `->func(xx)` could be anything. – Junkai Apr 08 '16 at 12:09
  • AFAIK only PHP has such a syntax. If using PHP parser does not work, write your own: http://web.iitd.ac.in/~sumeet/flex__bison.pdf – hek2mgl Apr 08 '16 at 12:14
  • @WiktorStribiżew Yes, but `->(bla, bla)` is PHP specific syntax – hek2mgl Apr 08 '16 at 12:17
  • @cjk: Ok, if you need all those capturing groups, here is your fixed pattern: [`(?:(\s*->\s*([\w.]+(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?3))*\)))))+\s*\z`](https://regex101.com/r/xI4bZ5/4). – Wiktor Stribiżew Apr 08 '16 at 12:18
  • @WiktorStribiżew What if the input is: `->("->('foo', 'bar')")` ?? I hope it is clear that a single regex (or two) **can't** do the job. – hek2mgl Apr 08 '16 at 12:19
  • @hek2mgl: "what-if" is not for a one-off job. The sample strings in the question is the scope I can suggest a regex for. I also added a disclaimer for code comments. And here is a regex for that case: [`(?:(\s*->\s*([\w.]*(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?3))*\)))))+\s*\z`](https://regex101.com/r/xI4bZ5/5) – Wiktor Stribiżew Apr 08 '16 at 12:20
  • 1
    @hek2mgl I said I am not writing PHP. AFAIK, only fish has the tail, so the dolphin is fish. – Junkai Apr 08 '16 at 13:00
  • @WiktorStribiżew Thank you, can I make a request more? I don't want to get the `white space` before `->god()` unless it is a new line char. – Junkai Apr 08 '16 at 13:09
  • Just [replace the first `\s` with `\n`](https://regex101.com/r/xI4bZ5/6). Or `[\r\n]*`. – Wiktor Stribiżew Apr 08 '16 at 13:12
  • @WiktorStribiżew This is not what I want, I just don't need the `\s` before `->god()`, but need other `\s` before other `->func()`(ignore headmost `\s`, but not the other). replace the first `\s`, all space before `->func()` are miss. – Junkai Apr 08 '16 at 13:34
  • 1
    Try a conditional: [`(?:((?(3)\s*|\R*)->\s*([\w.]*(\((?>'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^"'()]++|(?3))*\)))))+\s*\z`](https://regex101.com/r/xI4bZ5/7). – Wiktor Stribiżew Apr 08 '16 at 13:47
  • Perfect, thank you so much. I'll take time to understand it. – Junkai Apr 08 '16 at 13:54
  • I have updated the answer with the final pattern/demo and explanations. – Wiktor Stribiżew Apr 08 '16 at 13:59
-1

The proper answer is, you can't parse a non regular language with a regular expression. You need to either user an existing parser for that language if that exists or write your own.

The now publicly available "Flex & Bison" O'Reilly is a good read.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • When you say there is no solution, this is a comment, not an answer. – Wiktor Stribiżew Apr 08 '16 at 12:49
  • I don't get you. Sure there is a solution and I suggested that: write a parser. Writing the parser for the OP would exceed an answer and also can be done only if I exactly know the rules of the input language. A single example document is not enough for that. – hek2mgl Apr 08 '16 at 13:02
  • Suggestion to write a parser cannot be an answer. That is a comment. If you write a parser, and post it, that would be an answer. If I follow your approach, I should answer every HTML regex parsing question with "Use DOM/(X)HTML parser". What good is such an answer? – Wiktor Stribiżew Apr 08 '16 at 13:05
  • @WiktorStribiżew Why do you discuss with me? Your solution is plain wrong. The fact that the OP is asking how can I do this **with regex** shows that he never really considered the *right* solution. I pointed him towards it. – hek2mgl Apr 08 '16 at 13:08
  • [I told that already](https://stackoverflow.com/questions/36493132/how-can-i-get-functionxxx-xxx-in-the-end-of-some-text/36499566?noredirect=1#comment60606288_36495446) .. If you need more: [here](https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la) – hek2mgl Apr 08 '16 at 13:13
  • @hek2mgl I support WS views. And I am not coding PHP. This is my own syntax. I coding a sofeware, thay you select some text has `->func()` in the end. The program will change the text. Like `apple ->append('banana')`=>`apple banana`. So, the text can be anything. – Junkai Apr 08 '16 at 13:19
  • Yeah, that's fine. If you invent a language, you need to write a parser for that. That's what I'm saying and the book I've linked is a great resource on that. I just said, if that's PHP you might use the *existing* PHP parser. As it now turns out that it is not PHP but something custom, you need to write a parser. – hek2mgl Apr 08 '16 at 13:20
  • I don't need to do this job so complicated, put the text into function, job is done. – Junkai Apr 08 '16 at 13:27
  • Sorry, I don't get you. Your English is hard to understand. – hek2mgl Apr 08 '16 at 13:51
  • I see, my english is a shit ha. Thank you for your answer first. And what I want to do is get those text before `->func()`, and put them in the middle of the brackets. And then `eval('func(xxx)')`. – Junkai Apr 08 '16 at 14:07
  • You want to use `eval()` on the results of that fragile regex thingy?! That's super-fragile and super-dangerous. You are doing it wrong! – hek2mgl Apr 08 '16 at 14:26