0

As per title I need to parse string of the form string_1\string_2 as in a string followed by a backslash then by another string with the following requirements:

  • if string_1 and string_2 are present, break them into two tokens: string_1 and \string_2
  • if only string_1 is present, return it
  • if \string_2 is present but nothing behind the backslash, don't match anything.

So far I've come up with this :

^([\w\s]*)((?!\\\).*)

but the last character in string_1 keeps 'leaking' through and going to string_2 right before the backslash.

Is there a way to fix that? Or any other alternative regex? The following regex does helps with the leaking but it break the third requirement.

^([\w\s]*).((?!\\\).*)

In order to make sure this question is not too localized, note that this could help parse a subset of latex when you have a string coming before say \section{section title comes here {*}}.

nt.bas
  • 736
  • 1
  • 6
  • 13
  • @LiamSorsby Because it's exactly latex i'm trying to parse and lots of explode would quickly turn into a nightmare. – nt.bas Jul 20 '13 at 19:41
  • How is the data been constructed to contain with a backslash? – Liam Sorsby Jul 20 '13 at 19:44
  • @LiamSorsby I'm not sure I understand your question but assuming i do, i have a small latex file. it contains a limited number number of latex commands so it happens that i get a user text directly followed by a command - just like i explained in the question. Let me know if i missed your question. – nt.bas Jul 20 '13 at 19:48
  • Yes sorry, I Mistyped that question! I still don't understand why this would be a nightmare to use explode as you should only need one explode and then check the given arrays – Liam Sorsby Jul 20 '13 at 20:15
  • @LiamSorsby i'm sorry as well, i made an overstatement! You're right, explode can work as well and will probably perform better but since it strips the backslash, it will complicate the job of distinguishing user text and latex commands. Also the parser is doing the replacements I need as it matches a token which can get hard if i explode the full text. But I am open to suggestions before i can accept Chip Camden answer's because it works. – nt.bas Jul 20 '13 at 20:27
  • You are attempting to write a Latex [parser using regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags)... I would use an existing parser instead as described in [this question](http://stackoverflow.com/questions/2421768/php-based-latex-parser-where-to-begin). – Buggabill Jul 20 '13 at 20:28
  • @Buggabill Thanks for the link! But I am not going to support even 50% of latex, so i'm not bothered to make a full latex parser. For the math display, i'll use mathjax. Yes, the answer by Bobince! I won't waste my time trying to implement a full parser because latex can only be parsed by a Turing complete machine. – nt.bas Jul 20 '13 at 20:40

1 Answers1

2

I think this is the regex you're looking for:

/^([^\\]+)(\\.+)?/

The first group is a "non-\" of at least 1 character, followed by optional "\" and anything else.

Chip Camden
  • 210
  • 1
  • 7
  • I just noticed your requirement that if there's nothing after the \ don't return it. So change the * to a + and you're good. – Chip Camden Jul 20 '13 at 21:26