0

I have a string in HTML format

<div class="ExternalClass6FC23FEAF7454B3A8006CF7E1D2257B8">
<audio src="/sites/audioblogs/Group2Doc/0.021950338035821915.wav"   controls="controls"></audio><br/><img   src="/sites/audioblogs/Group2Doc/20140103_152938.jpg" alt=""/></div>

I need only the source(src) attribute, I'm trying to use Regex.Match,

Is there any other alternative?

Thanks, Sachin

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939

2 Answers2

2

I'd use HtmlAgilityPack to parse HTML, not regex:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);  // html is your string
var audio = doc.DocumentNode.Descendants("audio")
    .FirstOrDefault(n => n.Attributes["src"] != null);
string src = null;
if (audio != null)
    src = audio.Attributes["src"].Value;  

Result: /sites/audioblogs/Group2Doc/0.021950338035821915.wav

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Thanks Tim for your reply,Can i find multiple src tags in this? Since my HTML contains multiple src tags – Sachin Kothari Feb 16 '15 at 12:45
  • @SachinKothari: you mean multiple `audio` tags with a `src`? Then use: `Where` instead of `FirstOrDefault` and a `foreach` loop. – Tim Schmelter Feb 16 '15 at 12:55
  • @SachinKothari: if you need need the `List` with all src-attributes(as suggested by the first version of your title): `doc.DocumentNode.Descendants("audio").Where(n => n.Attributes["src"] != null).Select(n => n.Attributes["src"].Value).ToList()` – Tim Schmelter Feb 16 '15 at 13:24
0
string yourFullHtmlstring = ".....";
//will make sure all of your double quotes are single quotes
yourFullHtmlstring= yourFullHtmlstring.Replace("\"", "'");

//will turn it into array
string[] arr = yourFullHtmlstring.Split( new string[] {"src='"}, StringSplitOptions.None);

//this will trim the sources found only to the source value.
//start from 1 because we skip the first entry before the first src
for (int i = 1; i < arr.Length; i++)
{
    arr[i] = arr[i].Substring(0, arr[i].IndexOf("'"));
}
Ziv Weissman
  • 4,400
  • 3
  • 28
  • 61
  • Yes, maybe to one case, where the string "src='" is used not as a html src attribute... but it will take all sources from all html, not just from "audio". (I think that's what he wants) – Ziv Weissman Feb 16 '15 at 13:55