0

I havn't got any experience with java at all so im a bit lost. With selenium i've downloaded the html of a page and it is stored in a string. Now i'd like to grab all the data between and put this in array. so in jquery style i'd say it would be something this:

$('div[align="center"]').each(function(){
 array[] = $(this).text();
});

The difference is now i have to get it out of a string and do it in java. since i havn't got experience with java im looking for the easiest method.

Thanks for your help!

Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
Jab
  • 821
  • 3
  • 13
  • 26

3 Answers3

1

Instead of getting the whole HTML by Selenium (there are lighter tools for that, see Get html file Java), you can pick the right element with Selenium.

If you're using Selenium RC:

// assuming 'selenium' is a healthy Selenium instance
String divText = selenium.getText("css=div[align='center']");

or if you're using Selenium 2 (WebDriver):

// assuming 'driver' is a healthy WebDriver instance
String divText = driver.findElement(By.cssSelector("div[align='center']")).getText();

If there are actually more <div align="center"> elements, you can get them all:

List<WebElement> divList = driver.findElements(By.cssSelector("div[align='center']"));
// and use every single one
for (WebElement elem : divList) {
    System.out.print(elem.getText());
}

The Selenium JavaDocs. In particular, you want to see WebDriver, WebElement.

And the Selenium documentation in examples. Read it.

Community
  • 1
  • 1
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • got this almost working, except i need to user findElements instead of findElement. Im using selenium because i first need to post some forms. – Jab May 09 '12 at 19:13
  • Yep, that was the other option. That way, you can get multiple elements. I'll edit it into the answer. I'm glad it helped! – Petr Janeček May 09 '12 at 19:18
  • Thank you very much for your help! It works now! just one thing, in you last example [WebElements] has to be . Thanks again! – Jab May 09 '12 at 19:32
  • Hah, bad typo :). If you have more questions, feel free to start a SO chat room and post a link to it here. – Petr Janeček May 09 '12 at 19:37
0

I suggest you read this question:

Using Java to find substring of a bigger string using Regular Expression

The only difficulty here is the regex you'll have to build but this is not a java issue.

Do read the comments on line breaks and the use of Pattern.DOTALL flag.

EDIT: as Luciano mentioned, I would look for a better way of reading the html. Your String might contain more that one <div align="center"> and you might not get only what you wanted in the first place.

EDIT:

This code seems to work:

String html = "<div align=\"center\">text</div>";

Pattern MY_PATTERN = Pattern.compile("<div align=\"center\">(.*?)</div>");

Matcher m = MY_PATTERN.matcher(html);
while (m.find()) {
    String s = m.group(1);
    System.out.println(s);
}
Community
  • 1
  • 1
Eric C.
  • 3,310
  • 2
  • 22
  • 29
  • i've allready tried it with regex but i couldn't get it to work. ill try luciano suggestion. thanks for your input – Jab May 09 '12 at 15:45
  • When im adding that there is a problem with downloading the html, so its not working yet – Jab May 09 '12 at 16:10
  • I guess it's a server problem of the website, ill try it after diner again and will let you know my results with it – Jab May 09 '12 at 16:21
  • When i use this it doesn't work. Your example does but when i change String html = "
    text
    "; to String html = webdata.getPageSource(); It doesn't show anything. The string html isn't empty, when i print him it shows the whole source of the page.
    – Jab May 09 '12 at 19:00
  • Have you tried the `Pattern.DOTALL` flag? `Pattern MY_PATTERN = Pattern.compile("
    (.*?)
    ",Pattern.DOTALL);`
    – Eric C. May 10 '12 at 08:13
  • Anyway Slanec answer is the way to go !! – Eric C. May 10 '12 at 08:14
0

With selenium, instead of downloading the source page, use selenium to get hold of the html element from where you would want to fetch the text, either by using xpath or some locator(Selenium Locating strategies)..and then do a getText..something like..selenium.getText(locator_of_element). If its a list of elements, then you can loop through using index in front of the locator eg. //div[0], //div[i] etc

Hope it helps..

niharika_neo
  • 8,441
  • 1
  • 19
  • 31