I am getting my urls and titles from a post's content, but the titles no longer seem to be UTF-8 and include some funky characters such as "Â" when I echo the result. Any idea why the correct charset isn't being used? My headers do use the right metadata.
I tried some of the solutions on here, but none seems to work so I thought I'd add my code below - just in case I'm missing something.
$servername = "localhost";
$database = "xxxx";
$username = "xxxxx";
$password = "xxxx";
$conn = mysqli_connect($servername, $username, $password, $database);
$post_id = 228;
$content_post = get_post($post_id);
$content = $content_post->post_content;
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
$links = $doc->getElementsByTagName('a');
$counter = 0;
foreach ($links as $link){
$href = $link->getAttribute('href');
$avoid = array('.jpg', '.png', '.gif', '.jpeg');
if ($href == str_replace($avoid, '', $href)) {
$title = $link->nodeValue;
$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
$sql = "INSERT INTO wp_urls_download (title, url) VALUES ('$title', '$href')";
if (mysqli_query($conn, $sql)) {
$counter++;
echo "Entry" . $counter . ": $title" . "<br>";
} else {
echo "Error: " . $sql . "<br>" . mysqli_error($conn);
}
}
}
Updated Echo string - changed this after I initially uploaded the code. I have already tried the solutions in the other posts and was not successful.