0

How do I set the format right so I can actually grab the special characters as well.

When I System.out.println(response.body()); the body already lacks UTF-8 format. All special characters are transformed into question marks.

For example String title ends up like Would you draw this for me? ? and I want o get Would you draw this for me? including the emojis.

ArrayList<Entry> pullRss(){
    
    ArrayList<Entry> output = new ArrayList<>();
    
    try{
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
              .uri(URI.create("https://www.reddit.com/r/anysubreddit/.rss"))
              .build();

        HttpResponse<String> response = client.send(request, BodyHandlers.ofString());
    
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = factory.newDocumentBuilder();
        System.out.println(response.body());
        Document doc = db.parse(new ByteArrayInputStream(response.body().toString().getBytes("UTF-8")));
        
        NodeList nList = doc.getElementsByTagName("entry");
        for (int temp = 0; temp < nList.getLength(); temp++) {
            Node nNode = nList.item(temp);
            Element eElement = (Element) nNode;
            
            String user     = eElement.getElementsByTagName("name").item(0).getTextContent();
            String userUri  = eElement.getElementsByTagName("uri").item(0).getTextContent();
            String id       = eElement.getElementsByTagName("id").item(0).getTextContent();
            String link     = eElement.getElementsByTagName("link").item(0).getAttributes().getNamedItem("href").toString();
            String date     = eElement.getElementsByTagName("published").item(0).getTextContent();
            String title    = eElement.getElementsByTagName("title").item(0).getTextContent();
            
            output.add(new Entry(user, userUri, id, link, date, title));
        }
    }catch (Exception e) {
        e.printStackTrace();
    }

    return output;
}
Georodin
  • 191
  • 1
  • 2
  • 14
  • probably this is what you are looking for https://github.com/vdurmont/emoji-java – vaibhavsahu Aug 24 '21 at 00:42
  • 2
    The special character in `title` is displayed as a `?` when you print it out. That just means that your console can't display the special character. If you look at the string in the debugger, what UTF-16 code points does it contain? – tgdavies Aug 24 '21 at 00:56
  • Im retarded, the filewriter I used did not support UTF-8 even tho the file was recognized as UTF-8. This did work... https://stackoverflow.com/questions/1001540/how-to-write-a-utf-8-file-with-java – Georodin Aug 24 '21 at 15:25

0 Answers0