-2

I'm trying to extract MTQ0ODQ3NjcyNDoxNDQ4NDc2NzI0OjE6LTM4OTc1OTc2MjM4MDc1OTM2NjY6MTQ0ODQ3NjAwMzowOjA6NTQw from the string below.

I am having issues with the \\ (backslash) characters. How do I escape these in C#. Is there any documentation that shows characters that need escaping in regex patterns, and how to escape them?

first_cursor\\":\\"MTQ0ODQ3NjcyNDoxNDQ4NDc2NzI0OjE6LTM4OTc1OTc2MjM4MDc1OTM2NjY6MTQ0ODQ3NjAwMzowOjA6NTQw\\"

I've tried the following to no avail. I tried to avoid having to escape the backslashes altogether:

MatchCollection matches = Regex.Matches(content, "first_cursor*.quot;([-0-9A-Za-z]+)");

Any help would be much appreciated.

BugHunterUK
  • 8,346
  • 16
  • 65
  • 121
  • In your example RegEx, `*.` should be `.*`. Not sure if that's a typo, so I'm not going to submit an edit. – Dan Bechard Nov 25 '15 at 19:44
  • I just wanted to point out that if you're ever not sure what actually needs to be escaped in a literal string inside of a regex you can ask the system to figure it out for you using [System.Text.RegularExpressions.Regex.Escape()](https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape(v=vs.110).aspx) – Chris Haas Nov 25 '15 at 19:53
  • @Dan yeah typo, sorry. – BugHunterUK Nov 25 '15 at 20:59
  • @ChrisHaas excellent thank you. I will check that out. – BugHunterUK Nov 25 '15 at 21:00

3 Answers3

2

In C# each backslash in a string can be written as \\\\.

You can use the following:

MatchCollection matches = Regex.Matches(content, "first_cursor\\\\{2}":\\\\{2}&quot([-0-9A-Za-z]+)");
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • I am accepting this because you posted first and the solution worked, even though others posted more verbose responses. But I honestly appreciate the answers from others, and thank you all for your time. – BugHunterUK Nov 25 '15 at 20:08
  • @BugHunterUK you can [up-vote](http://stackoverflow.com/help/privileges/vote-up) all helpful answers. – Mariano Nov 26 '15 at 00:23
1

I prefer to use verbatim string literals when writing RegEx strings in C#:

string pattern = @"first_cursor\\\\":\\\\"([-0-9A-Za-z]+)\\\\"";

This prevents you from having to escape the slashes twice; once for C# and again for the RegEx engine.

As an aside, this syntax is also useful when storing paths in strings:

string logFile = @"C:\Temp\mylog.txt";

And even supports multi-line for SQL commands and such:

string query = @"
    SELECT *
      FROM tblStudents
     WHERE FirstName = 'Bobby'
       AND LastName = 'Tables'
";
Dan Bechard
  • 5,104
  • 3
  • 34
  • 51
0

You can use lookahead to elimate some of the contenders:

var example = @"first_cursor\\":\\"MTQ0ODQ3NjcyNDoxNDQ4NDc2NzI0OjE6LTM4OTc1OTc2MjM4MDc1OTM2NjY6MTQ0ODQ3NjAwMzowOjA6NTQw\\"";
var regex = new Regex("(?<!&[-0-9A-Za-z]*)(?<!_[-0-9A-Za-z]*)[-0-9A-Za-z]+");
var matches = regex.Matches(example);
foreach(var match in matches)
{
  if (match.ToString() != "first")
  {
    Console.WriteLine(match);
  }
}

This would give you two matches. One for first and one for the string you are looking for. Then you can iterate over the matches and see if it's not "first" then it should be what you are looking for.

Cat_Clan
  • 352
  • 2
  • 9