I want to get automatically data about real estate from this site:
However, they do not have an api. How would you generally do that? I am thankfully for every response!
I want to get automatically data about real estate from this site:
However, they do not have an api. How would you generally do that? I am thankfully for every response!
You're going to have to download the page yourself, and parse through all the info yourself.
You possibly want to look into the Pattern
class, look at some regex
, and the URL
and String
classes will be very useful.
You could always download an html library to make it easier. Something like http://htmlparser.sourceforge.net/ possibly.
Very general question so obviously I can't provide relevant code, but this is known as scraping.
well this is how you get all the content from the page
then you can parse the page data as you want
package farzi;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URISyntaxException;
import org.apache.http.HttpException;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
public class GetXMLTask
{
public static void main(String args[])
{
try
{
HttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost("http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=×tamp=1363245585829");
HttpResponse response;
StringBuilder builder= new StringBuilder();
response = httpClient.execute(httpPost);
System.out.println(response.toString());
BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
char[] buf = new char[1000];
int l = 0;
while (l >= 0)
{
builder.append(buf, 0, l);
l = in.read(buf);
}
System.out.println(builder.toString());
}
catch (URISyntaxException e) {
System.out.println("URISyntaxException :"+e);
e.printStackTrace();
}
catch (HttpException e) {
System.out.println("HttpException :"+e);
e.printStackTrace();
}
catch (InterruptedException e) {
System.out.println("InterruptedException :"+e);
e.printStackTrace();
} catch (IOException e) {
System.out.println("IOException :"+e);
e.printStackTrace();
}
}
}