-1
int firstTag = source.IndexOf("data-token=");
int lastTag = source.IndexOf("\"href", firstTag);
int startIndex = firstTag + 12;
int endIndex = lastTag + 5;
string authenticityToken = source.Substring(startIndex, endIndex - startIndex);

The string I want to parse is from here:

<a class="bizLink" data-token="-iUzEhgdscgbpj5VMi5zoh54FTeFt8M4mj5nsiodxR5VzZOhniodpj6nFQg0nce3MhUxFSgdxjM4J
jUVzZuNu8o0sREnFSUzISUXzZWh4iodGQfdxR5VzZWh4iodGQfhli6fnce_=" 
                           href="

I want to get only the string between " and " only this:

-iUzEhgdscgbpj5VMi5zoh54FTeFt8M4mj5nsiodxR5VzZOhniodpj6nFQg0nce3MhUxFSgdxjM4J
    jUVzZuNu8o0sREnFSUzISUXzZWh4iodGQfdxR5VzZWh4iodGQfhli6fnce_=

But what I get with my code is this long string I wanted, but also all the rest of the file text.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272

2 Answers2

4

The sane way would be to use a HTML parser and querying library. I can suggest CsQuery, which is a jQuery-like library in .NET. You could use a selector like a[data-token] to match your anchor, then extract the attribute value.

This is the correct way of doing things.


But if you only ever want to get this one attribute and don't do anything with the HTML source ever again, it might be easier to just use a regex, but beware: parsing HTML with regex is evil.

So if all you want to do is just extract this one piece of information, as an exceptional measure, for your information, you could use that:

var m = Regex.Match(source, @"data-token\s*=\s*""(?<token>.+?)""");
var authenticityToken = m.Groups["token"].Value;

But try CsQuery first. It's a much better approach.

Community
  • 1
  • 1
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • I agree, just think that the regular expression pattern can be simpler: "data-token=\"([^\"]+)\"" – Zohar Peled Jun 29 '14 at 13:32
  • While this is true, I opted for a more correct pattern (html allows spaces around the `=` sign). Besides, I like to use named captures, it makes the regex easier to read IMO. – Lucas Trzesniewski Jun 29 '14 at 13:36
0

Working example http://ideone.com/U224iZ

string start = "data-token=";
  string end = " href";

  string source = "<a class='bizLink' data-token='-iUzEhgdscgbpj5VMi5zoh54FTeFt8M4mj5nsiodxR5VzZOhniodpj6nFQg0nce3MhUxFSgdxjM4JjUVzZuNu8o0sREnFSUzISUXzZWh4iodGQfdxR5VzZWh4iodGQfhli6fnce_=1\" href='";

  int firstTag = source.IndexOf(start);
  int lastTag = source.IndexOf(end, firstTag );
  int startIndex = firstTag + start.Length +1;
  int endIndex = lastTag;
  string authenticityToken = source.Substring(startIndex, endIndex - startIndex -1);
  Console.Write(authenticityToken);
  Console.ReadLine();
ale
  • 10,012
  • 5
  • 40
  • 49