1

Twitter returns a webpage that these lines are among others:

<link rel="dns-prefetch" href="//video.twimg.com" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://ma-0.twimg.com/twitter-assets/responsive-web/web/ltr/vendor.69f9ac19fa493004.js" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://ma-0.twimg.com/twitter-assets/responsive-web/web/ltr/i18n/en.312d3f56908013c9.js" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://ma-0.twimg.com/twitter-assets/responsive-web/web/ltr/main.da8c0a0fbf03fdac.js" />
<meta property="fb:app_id" content="2231777543" />

I need the url that contains the main.*.js file. How can I get it?

I tried this:

var mainIndex = content.IndexOf("main.");
var startIndex = content.LastIndexOf("href=\"", mainIndex) + 6;
var endIndex = content.IndexOf(".js", startIndex) + 3;
var url = content.Substring(startIndex, endIndex - startIndex);

but it's a bad unsafe implementation. Thanks.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Blendester
  • 1,583
  • 4
  • 19
  • 43

2 Answers2

3

You can do it with a dedicated HTML parser, such as Html Agility Pack

var text = "<link rel=\"dns-prefetch\" href=\"//vide.... />";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);

var links = doc
    .DocumentNode
    .SelectNodes("//link")
    .Select(e=>e.Attributes["href"].Value);

links
    // here, you could parse and match the URL robustly
    .Where(href=>href.Contains("main"))
    // try it in LINQPad
    .Dump();

result: https://ma-0.twimg.com/twitter-assets/responsive-web/web/ltr/main.da8c0a0fbf03fdac.js

Dmitry Ledentsov
  • 3,620
  • 18
  • 28
-3

it is definitely a good idea to use regex for it. first one regex for the prefix part that should be replaced with empty string and same thing for end.
you will need to escape the chars like < with \< and to use the normal regex syntax to exactly define what is required.