1

I want to extract data HTML using Java. I tried using Jsoup but so far I'm unable to extract the correct data. Here is the HTML code snippet from which I'm trying to extract the data.

<a href="javascript:;" id="listen_880966" onclick="MP3PREVIEWPLAYER.showHiddePlayer(880966, 'http://mksh.free.fr/' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3')" title="Listen Paint it Black    The Black Dahlia Murder   Great Metal Covers 36" class="button button-s button-1 listen "   >

I want the link ("http://mksh.free.fr/' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3") and the title to be extracted into different variables. It would be really helpful if a sample code is provided along with the answer.

Akas Antony
  • 739
  • 1
  • 8
  • 19

1 Answers1

4

You can use Regular Expressions to parse out the section you want. Then you can use something like string.split(delimiter) to extract out the specific info. See this link for info on the string.split() method

import java.util.regex.*;
import java.lang.*;

class Main
{
    public static void main (String[] args) throws java.lang.Exception
    {
            String mydata = "<a href=\"javascript:;\" id=\"listen_880966\" onclick=\"MP3PREVIEWPLAYER.showHiddePlayer(880966, 'http://mksh.free.fr/' + 'lol/mp3/Paint_It_Black/18_the_black_dahlia_murder_-_paint_it_black_(rolling_stones)-bfhmp3.mp3')\" title=\"Listen Paint it Black    The Black Dahlia Murder   Great Metal Covers 36\" class=\"button button-s button-1 listen \"   >";
            Pattern pattern = Pattern.compile("'http://mksh.free.fr/'\\s.\\s'[\\(\\).A-Za-z0-9/_-]+'");
            Pattern title = Pattern.compile("title=\\\"[A-Za-z0-9\\s]+\\\"");
            Matcher matcher = pattern.matcher(mydata);
            if (matcher.find())
            {
                System.out.println(matcher.group(0));

            }
            matcher = title.matcher(mydata);
            if(matcher.find())
                System.out.println(matcher.group(0));
    }
}

Ideone

Community
  • 1
  • 1
D. Gibbs
  • 540
  • 3
  • 17