I am gathering data using Jsoup from a webpage that includes a php script linked to my database. The data that I am getting includes Navigational Coordinates like this: 51°42’.41N 004° 54’.61W
The data displays correctly on the webpage but when I parse it using jsoup and insert the resulting strings into my app they include the Replacement Character U+FFFD � at certain points in the string...like this:
51�42�.41N 004� 54�.61W
I can remove those specials by using this:
.replaceAll("\uFFFD", "")
However this then results in this:
51 42 .41N 004 54 .61W
This isn't very desirable as these are navigational coordinates.
Is Jsoup responsible for this or is it purely that Android cannot display these characters?
Is it possible to 'catch' those characters before they are made into the � so I could match them with something similar that Android would display?
For Example the character displayed in the Navigational coordinates is a "Ordinal" symbol º and I could replace it with a "degree" symbol ° .
Additional: Code I am using to collect the Data:
//Get the Notices to Mariners Amount
Element ntmNumber = tableRows.select("td:eq(0)").last();
String ntmAmt = ntmNumber.text();
System.out.println("NtmLoadingTask is Running");
//In-case Data does not exist...
if (tableRows != null) {//Exists...
//Convert Ntm Number to int for Gathering the Ntm List
int ntmInt = Integer.parseInt(ntmAmt);
for (int i = 0; i < ntmInt; i++) {
//Get Ntm Titles
Elements titles = tableRows.select("td:eq(1)");
String ntmTitle = titles.get(i).text() + "\n";
arr_dataNtmTitles.add(ntmTitle);
//Get Ntm Dates
Elements dates = tableRows.select("td:eq(2)");
String ntmDates = dates.get(i).text() + "\n";
arr_dataNtmDates.add(ntmDates);
//Get Ntm Content
Elements contents = tableRows.select("td:eq(3)");
String ntmContent = contents.get(i).text().replaceAll("\uFFFD", "") + "\n";
arr_dataNtmContents.add(ntmContent);
System.out.println(ntmContent);
}
Update 1:
I have tried: .replaceAll("\u00BA", "\u00B0")
with no success :(
Update 2:
I have gone back to the original Java code that I wrote to collecting the data and insert it into the database, I have used the following to replace the unwanted characters:
content = Content.text().replaceAll("[º°]", "°") +"\n";
and verified that it is doing its job by doing this:
content = Content.text().replaceAll("[º°]", "*") +"\n";
it is definitely working and is replacing the "ordinal" symbol with what I thought android would accept (a Degree symbol = °) but I am STILL getting this:
51�42�.41N 004� 54�.61W
Also this perhaps is important to finding a solution and I hadn't noticed it before (concentrating on the Ordinal symbol) but I am also getting the � at various other places in the strings, like this:
NO. 41�� OF 2014 Dock Lock Works 1.�MARINERS ARE HEREBY ADVISED....
and
Mariners are hereby advised that the deployment of �fire wires' is.....
From this I can see that some are clearly meant to be a " space " (there are meant to be 2 spaces after the 41) and some are meant to be an ' apostrophe. So I could really use some help on this, I have tried cleaning out the bad characters before inserting them into the database and after parsing them from the PHP page (on the page they appear as the should do) to no avail. Is there something I'm missing as when parsing other pages with jsoup I don't get this problem and I am thinking now that it is less to do with androids inability to display the characters and more to do with how they are inserted or coming out of the database? it is like it is filtering out SQL Injection or something with the removing of Apostrophes and alike??
PHP Script:
<?php
header('Content-Type: text/html; charset=utf-8');
$con=mysqli_connect("******","*******","*******","*******");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$result = mysqli_query($con,"SELECT * FROM **********");
echo "<table border='1' title='table1'>
<title>HTML Table With PHP</title>
<caption>*************</caption>
<tr>
<th>NTM ID</th>
<th>NTM TITLE</th>
<th>NTM DATE</th>
<th>NTM CONTENT</th>
</tr>";
while($row = mysqli_fetch_array($result))
{
echo "<tr>";
echo "<td>" . $row['ntmID'] . "</td>";
echo "<td>" . $row['ntmTitle'] . "</td>";
echo "<td>" . $row['ntmDate'] . "</td>";
echo "<td>" . $row['ntmContent'] . "</td>";
echo "</tr>";
}
echo "</table>";
mysqli_close($con);
?>