0

Anything wrong with this code.. If I add this line (String c= t.parseToString(content);) below the Ti t = new Ti(); then I get the actual content of the url back but after that I get null values for Keywords, Title and Authors. And If I remove this line (String c= t.parseToString(content);) then I get the actual values for Title, Author and Keywords.. Why is it so??

HttpGet request = new HttpGet("http://xyz.com/d/index.html");

        HttpResponse response = client.execute(request);
        HttpEntity entity = response.getEntity();
        InputStream content = entity.getContent();
        System.out.println(content)    

        Ti t = new Ti();
        String ct= t.parseToString(content);
        System.out.println(ct);

        Metadata md = new Metadata();



        Reader r = t.parse(content, md);
        System.out.println(md);


        System.out.println("Keywords: " +md.get("keywords"));
        System.out.println("Title: " +md.get("title"));
        System.out.println("Authors: " +md.get("authors"));
AKIWEB
  • 19,008
  • 67
  • 180
  • 294
arsenal
  • 23,366
  • 85
  • 225
  • 331

1 Answers1

1

You are reading the same stream multiple times. After you read a stream fully, you cannot read it again. Do something like,

HttpResponse response = client.execute(request);
HttpEntity entity = response.getEntity();

//http://stackoverflow.com/questions/1264709/convert-inputstream-to-byte-in-java
byte[] content = streamToByteArray(entity.getContent());

String ct = t.parseToString(new ByteArrayInputStream(content));
System.out.println(ct);

Metadata md = new Metadata();
Reader r = t.parse(new ByteArrayInputStream(content), md);
System.out.println(md);
sbridges
  • 24,960
  • 4
  • 64
  • 71