0

I'm trying to match a tag values (music kind styles) of an xml.

This are some examples:

One music genre: (Pop) (rel="tag">Pop)

<span class="genres"><a href="http://www.mp3crank.com/genre/shoegaze" rel="tag">Pop</a></span>

Two music genres: (Reggae) (Ska) (rel="tag">Reggae) (rel="tag">Ska)

<span class="genres"><a href="http://www.mp3crank.com/genre/reggae" rel="tag">Reggae</a> / <a href="http://www.mp3crank.com/genre/ska" rel="tag">Ska</a></span>

More than two music genres: (Alternative) (Indie) (Rock) (rel="tag">Alternative) (rel="tag">Indie) (rel="tag">Rock)

<span class="genres"><a href="http://www.mp3crank.com/genre/alternative" rel="tag">Alternative</a> / <a href="http://www.mp3crank.com/genre/indie" rel="tag">Indie</a> / <a href="http://www.mp3crank.com/genre/rock" rel="tag">Rock</a></span>

What I need is to obtain the "Genre" values to append it in a variable:

rel="tag">Genre</a>

...or better if I can obtain "Genre" without the rel="tag"> part, but really no matter.

This is the RegEx I did, is not working good, Is only matching the first tag even if exist two or more genre tags.

Dim RegEx_AlbumStyle As New Regex(<a><![CDATA[rel=.+</a>\s?[^><]|rel=.+</a>]]></a>.Value)

This is the code:

Dim AlbumStyle as string

Dim RegEx_AlbumStyle As New Regex(<a><![CDATA[rel=.+</a>\s?[^><]|rel=.+</a>]]></a>.Value)

If Line.Contains(<a><![CDATA[<span class="genres">]]></a>.Value) Then

For Each Style In RegEx_AlbumStyle.Match(Line).Groups
    MsgBox("match:" & Style.ToString)

    ' I need to append all found matches to a string variable
    ' AlbumStyle +=  ", " & Style.ToString
    ' But I only find one match even if exists more than one genre value in the string
Next

End If
abatishchev
  • 98,240
  • 88
  • 296
  • 433
ElektroStudios
  • 19,105
  • 33
  • 200
  • 417
  • 1
    Your answer is [here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) and [here](http://htmlagilitypack.codeplex.com/) – I4V May 22 '13 at 16:57
  • @I4V my question is about helping me to create that regex to get the desired values, not how to parse an xml using htmlagilitypack, also the question of the first url is not the same as this, is other kind of problem using regex, but thanks anyway for comment this. – ElektroStudios May 22 '13 at 17:01
  • 4
    My intension with first link was to show *You can't parse [X]HTML with regex* – I4V May 22 '13 at 17:10
  • You are trying to trisect an angle with a compass and straightedge. – Dour High Arch May 22 '13 at 17:39

1 Answers1

1

Agree that this could break in the future and isn't the best way, but maybe this will help if you want to go this route. This returns 3 messageboxes for me assuming the span tag is loaded to a string:

Private Sub Input()
    Dim genreString As String = "<span class=""genres""><a href=""http://www.mp3crank.com/genre/alternative"" rel=""tag"">Alternative</a> / <a href=""http://www.mp3crank.com/genre/indie"" rel=""tag"">Indie</a> / <a href=""http://www.mp3crank.com/genre/rock"" rel=""tag"">Rock</a></span>"
    ShowGenres(genreString)
End Sub
Private Function ShowGenres(ByVal s As String) As String
    Dim m As Match = Regex.Match(s, "tag"">(\w+)<")
    Do While m.Success
        MessageBox.Show(m.Groups(1).ToString)
        m = m.NextMatch()
    Loop
    Return False
End Function
maxedev
  • 941
  • 8
  • 18
  • Your code is so great, thankyou. for the future can you tell me which is the best way to do this for you? maybe htmlagilitypack? I want to know to experiment in the future, now I prefer to do it this route. thanks agains – ElektroStudios May 22 '13 at 21:19
  • 1
    No problem - HTML Agility Pack is terrific, or you could check out LINQ to XML to deal with XML files - examples: http://www.dotnetcurry.com/ShowArticle.aspx?ID=564 – maxedev May 23 '13 at 13:49