-2

I have a StringBuffer with a HTML Site in it and want to have some specific information of this Site.

1 line is f.e.:

img class="a" data-src="http://test.com" src="" /<

and i want a String with "http://test.com".

Is there a function/parser which can help me?

user3688653
  • 85
  • 1
  • 7

3 Answers3

0

Jsoup will do the trick, just do a little css and you can get whatever element you need.

Document doc = Jsoup.connect("http://test.com").get();
//DOM Selector CSS String see jsoup docs.
//This will select all image elements with the a class similar to css. IE: img.a
//http://jsoup.org/cookbook/extracting-data/selector-syntax
//Get all elements that are images with class of a

Elements images = doc.select("img.a");

for (Element image : images) {
//Get the url of the image

String url = image.attr("data-src");;

}

  • i didnt get it. i load the html page in doc, but what does doc.select("img[data-src]"); do? i dont know the data-src and dont want the src of all images, only the images with the class a – user3688653 Jul 24 '14 at 06:58
  • @user3688653 updated this for clarity let me know if you have any other questions. – jonny_bouta Jul 24 '14 at 19:21
0

This is a common question and you could've found the answer with a quick Google search.

Look into Regular Expression (regex) as you'll need it probably more than once.

Kyron
  • 64
  • 9
  • jeah but i didnt found what i need. I need a regex lik that: xxxINEEDTHATyyy where i can say: i want "INEEDTHAT" between xxx and yyy where INEEDTHAT is unknown – user3688653 Jul 24 '14 at 06:56
  • I don't want to take credit for someone else's answer, so here's a link to the same question - http://stackoverflow.com/questions/11255353/java-best-way-to-grab-all-strings-between-two-strings-regex – Kyron Jul 24 '14 at 07:05
  • thx that is perfect. one last question: now i have 2 patterns (p,l), and i what them in the same matcher(m) because of the order: Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + > > Pattern.quote(pattern2)); Pattern l = Pattern.compile(Pattern.quote(pattern3) + "(.*?)" + > > Pattern.quote(pattern4)); Matcher m = p.matcher(res.toString()); while (m.find()) { System.out.println(m.group(1)); } How do i get both patterns in m? – user3688653 Jul 24 '14 at 07:41
0

Consider JSoup framework.

There is "Selector" mechanism to find and operate with html elements.