0

I have a text and want to find some strings with regular expressions. My problem is i don't know how to make "&&-and" here.

I have text like this:

AB " something is here;
  and here...
";
...
AB "new line 
 continues ... ";

I want to find all AB, which end with ";

My code i use for ";" :

var matches = Regex.Matches(tmp, "(AB) ([^;]*);", RegexOptions.Singleline);

But how can i make "(AB) ([^(\";)]*)\";" or just "(AB) ([^(\"&&;)]*)(\"&&;") ?

I would like to have:

AB

" something is here;
 and here ...
"

AB
"new line
 continues ..."
Uni Le
  • 783
  • 6
  • 17
  • 30
  • And what exactly do you want to match? Could you perhaps show an example block of text, and then the part you want matched? – Mr47 Nov 05 '12 at 10:25

3 Answers3

2

^ can only negate character classes, which in turn cannot contain strings of characters (but only single characters). However, there is a similar concept for strings of characters (or in fact full-fledged regular expressions) called a negative lookahead:

@"(AB) (\"(?:(?!\";).)*\");"

This will now consume arbitrary characters (.) as long as they do not mark the start of a ";. Keep using the Singleline option, of course. You can do some reading on lookaround assertions here.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • please see my edit, what i want to achive. I tried your code but gives me some errors with @"AB ? – Uni Le Nov 05 '12 at 11:21
  • @UniLe note that I my snippet is not a regex but the full regex-string (as a verbatim string, which should always be used for regular expressions). if you want to capture the two parts separately, you can just add parentheses: `"(AB) (\"(?:(?!\";).)*\");"`. However, this is a bit pointless in your case, because the first two characters of your match will always be `AB` and the characters from index `3` to the second-to-last index, will be the rest. – Martin Ender Nov 05 '12 at 13:38
  • @m.buettned: thank you, your code works great. AB was just an example, it could be more of them like AB CD..., but what i wanted is your code, thanks once more – Uni Le Nov 05 '12 at 15:01
  • when i have a string `AB 134 aBd " ... ";` how would i change your code to to it? something like `"(AB) ([0-9]) ([A-Za-z0-9]) (\"(?:(?!\";).)*\");"` – Uni Le Nov 08 '12 at 09:09
  • @UniLe close, except for the fact that you need to repeat the character classes (otherwise they will just match a single character of the allowed ones): `@"(AB) ([0-9]+) ([A-Za-z0-9]+) (\"(?:(?!\";).)*\");"`. Of course you can make all of the optional, so it still works with your initial examples: ``@"(AB)\s*([0-9]*)\s*([A-Za-z0-9]*)\s*(\"(?:(?!\";).)*\");"` – Martin Ender Nov 08 '12 at 09:12
  • thanks, forgot `+` to add. regex is not so difficult like i thought :) – Uni Le Nov 08 '12 at 09:26
  • `([A-Za-z0-9]+)` is for char and digit, what about `ab_Cd9` it won't be matched – Uni Le Nov 08 '12 at 09:48
  • @UniLe Well, then add the underscore: `[A-Za-z0-9_]`. This character can be shortened to `\w` though. – Martin Ender Nov 08 '12 at 09:50
  • thanks, is there any way to match all character instead of `[A-Za-z0-9_]`? i mean that it could be anything there `?\=...` – Uni Le Nov 08 '12 at 09:53
  • @UniLe `\S` will match any non-whitespace character. You should really [check out a tutorial](http://www.regular-expressions.info/tutorial.html) – Martin Ender Nov 08 '12 at 09:54
  • so i tired `(\S+)` but it said unrecognized escape sequenz – Uni Le Nov 08 '12 at 09:58
  • do you use a verbatim string? (with the `@` before the string) if that doesn't do it, please ask a new question. these comments are not meant for lengthy discussions. – Martin Ender Nov 08 '12 at 10:02
0

I don’t think that AND operator exists or is even required. read this question

However to complete your string finding solution, this regex guide can help you a lot. Hope it helps.

http://www.regular-expressions.info/reference.html

Community
  • 1
  • 1
SajjadHashmi
  • 3,795
  • 2
  • 19
  • 22
0

Try this Regex:

AB\s+\"((?:.+\s*)+?(?=\";))\";

explain:

(?= subexpression) Zero-width positive lookahead assertion.

(?: subexpression) Defines a noncapturing group.

. Wildcard: Matches any single character except \n.

\s Matches any white-space character.

Community
  • 1
  • 1
Ria
  • 10,237
  • 3
  • 33
  • 60