2

I am trying to parse a string from a website using Jsoup and wrote the following test to verify that the parsing

This is my test:

@Test
public void extractBookData() throws Exception {
    String bookLink = ""; //some address
    Document doc = Jsoup.connect(bookLink).get().html();

    Book book = new Book();

    assertEquals("Literatür Yayıncılık", book.getPublisher(doc));
}

This is getPublisher(Element) method:

public String getPublisher(Element element){
    String tableRowSelector = "tr:contains(Yayınevi)";
    String tableColumnSelector = "td";
    String tableRowData = "";
    element = element.select(tableRowSelector).last();
    if (element != null) {
          element = element.select(tableColumnSelector).last();
          if (element != null) {
                tableRowData = element.text().replaceAll(tableRow.getRowName() + " ?:", "").replaceAll(tableRow.getRowName() + " :?", "").replaceAll(" ?: ?", "").trim();
          }
    }
    return tableRowData;
}

Comparison failure window suggests that everything is different

The problem is that the actual and expected strings appears the same even though JUnit tells otherwise.

I am open to your suggestions please.

Buddy
  • 2,074
  • 1
  • 20
  • 30
  • could there be a space or something like that in one of the strings? – assylias Jun 14 '17 at 16:36
  • 1
    Have you tried to copy the expected value from the comparison windows and paste it into your test as expected? – Stefan Birkner Jun 14 '17 at 16:41
  • Additionals spaces are normally highlighted in the comparison window. I tried copy pasting the expected value to the test, it passed. Is it related to character format? – Buddy Jun 14 '17 at 16:47
  • Have you looked for non-printing characters? – VitalyZ Jun 14 '17 at 16:56
  • I looked for non-printing characters, there is none in the expected. I don't know how to check in the actual. When I copy the actual value and paste to the expected value in the test, the test passes. I converted the test to `String s = "Literatür Yayıncılık"; for (int i = 0; i < s.length(); i++) assertEquals(s.charAt(i), book.getPublisher().charAt(i));`, and now the test fails when for loop hits at the space char at char=32. The expected value is 32 and the actual value is 160. Strangely ASCII 160 appears like space in IntelliJ but it is shown like `a` in asciitable.com – Buddy Jun 14 '17 at 17:02

1 Answers1

2

I have had this same issue before, this is a non-breaking space (char 160) wich is in your text instead of a space (char 32). In my case the text came from an html text input value, yours looks like it hes also come from html.

The solution I used was just too replace all non breaking space chars with a space.

MartinByers
  • 1,240
  • 1
  • 9
  • 15