2

I am using web harvest. However, I want to scrape data from the URL:

http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912

My code is:

<?xml version="1.0" encoding="UTF-8"?>

<config>
    <var-def name="google">
    <html-to-xml>
    <http url="http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912"></http>
    </html-to-xml>
    </var-def>
</config>

However I get:

Reference to the entity Bezirke has to end with an ';'

I do not understand what is meant by web harvest, with the ';'?

user2051347
  • 1,609
  • 4
  • 23
  • 34
  • 1
    I am not sure how you are going o harvest the web, but I will recommend you to use Jsoup. It's really easy and useful. – cwhsu Mar 15 '13 at 00:23

2 Answers2

1

I don't know too much about web-harvesting, but their example has this:

<xpath expression="//a[@shape='rect']/@href">
    <html-to-xml>
        <http url="http://www.somesite.com/"/>
    </html-to-xml>
</xpath>

<http url =".." />

Whereas your code has

<http url = ".."></http> 

Maybe this is your problem? No need for closing tag

1

You should encode ampresands in your url ie. change every & with &amp;.

Josip Maslac
  • 270
  • 3
  • 8