0

Why do I get strange characters in my twitter tweets?

For example:

RT @FrankPasquale: “Far from being a superstate, the EC’s bureaucracy is tiny--25,000 people for an EU population of 500 million" https://t…

Especially https://t… - it is not a valid link/ url at all!

Follow this guide, below is my code how I get the twitter tweet from an user timeline:

$settings = array(
    'oauth_access_token' => "xxx",
    'oauth_access_token_secret' => "xxx",
    'consumer_key' => "xxx",
    'consumer_secret' => "xxx"
);

$url = "https://api.twitter.com/1.1/statuses/user_timeline.json";

$requestMethod = "GET";

$getfield = '?screen_name=xxxx&count=6';

$string = json_decode(
    $twitter->setGetfield($getfield)
            ->buildOauth($url, $requestMethod)
            ->performRequest(), $assoc = TRUE
    );

if (isset($string["errors"]) && $string["errors"][0]["message"] != "") {
    echo "<h3>Sorry, there was a problem.</h3>
          <p>Twitter returned the following error message:</p>
          <p><em>" . $string[errors][0]["message"] . "</em></p>";
    exit();
}

foreach($string as $items) {
    echo "Time and Date of Tweet: ".$items['created_at']."<br />";
    echo "Tweet: ". $items['text']."<br />";
    echo "Tweeted by: ". $items['user']['name']."<br />";
    echo "Screen name: ". $items['user']['screen_name']."<br />";
    echo "Followers: ". $items['user']['followers_count']."<br />";
    echo "Friends: ". $items['user']['friends_count']."<br />";
    echo "Listed: ". $items['user']['listed_count']."<br /><br />";
}

Any ideas why and how I can fix it?

EDIT:

If I try this:

htmlentities($items['text'], ENT_NOQUOTES, 'UTF-8')

I get this:

RT @FrankPasquale: “Far from being a superstate, the EC’s bureaucracy is tiny--25,000 people for an EU population of 500 million" https://t

The link https://t is totally broken!

I also have set my html to:

<meta charset="UTF-8">

But I still get this kind of links:

https://t

Which is totally broken!

EDIT 2:

The problem is coming from twitter json:

{"created_at":"Fri Jun 24 14:28:16 +0000 2016","id":xxx,"id_str":"xxxx","text":"RT @muslimgirl: The people who invaded & colonized the world decided they wanted independence from its consequences
#BrexitVote\nhttps:\/\/t.c\u2026"

If you look at #BrexitVote\nhttps:\/\/t.c\u2026 that is the bug.

How can fix that?

Run
  • 54,938
  • 169
  • 450
  • 748
  • 1
    Check the character encoding of your PHP/HTML page. It should probably be UTF-8, and not something else. – KIKO Software Jun 25 '16 at 11:42
  • @KIKOSoftware please see my edit above. The result is now even stranger! – Run Jun 25 '16 at 11:51
  • No, I was talking about the whole page, not any particular string in PHP. Check the page character encoding in your browser. Also see: http://www.w3schools.com/html/html_charset.asp – KIKO Software Jun 25 '16 at 11:54
  • I did that on the whole page as well. – Run Jun 25 '16 at 11:57
  • 1
    And you got rid of the `htmlentities()`? Is the page now HTML5 and UTF-8? Check it in your browser. Usually under 'page info' in the menu. Note that `` will only work if your page is HTML5. – KIKO Software Jun 25 '16 at 11:58
  • `And you got rid of the htmlentities()?` - yes I did. – Run Jun 25 '16 at 11:59
  • `Usually under 'page info' in the menu` can't find 'page info' ... – Run Jun 25 '16 at 12:01
  • Well, since I don't know what browser your are using, I cannot tell where it is in yours. – KIKO Software Jun 25 '16 at 12:02
  • I am using chrome. – Run Jun 25 '16 at 12:03
  • but i have the html5 tags in my page already: ` Tweets ` – Run Jun 25 '16 at 12:04
  • 1
    That does look like the right tags for a HTML5/UTF-8 page. You can find the encoding here: Chrome -> use hamburger menu on the right -> More Tools... -> Encoding -> here it is. – KIKO Software Jun 25 '16 at 12:04
  • Thanks for the guide. Yes I have found it and it is /UTF-8 . – Run Jun 25 '16 at 12:11
  • I think it might the problem from the tweet text itself. – Run Jun 25 '16 at 12:12
  • That's possible. You could `var_dump()` the argument you give to `json_decode()` (so not the returned value of this function!) and see what that looks like. – KIKO Software Jun 25 '16 at 12:14
  • oh I know why - it is the json! `{"created_at":"Fri Jun 24 14:28:16 +0000 2016","id":xxxx,"id_str":"xxx","text":"RT @muslimgirl: The people who invaded & colonized the world decided they wanted independence from its consequences #BrexitVote\nhttps:\/\/t.c\u2026"` – Run Jun 25 '16 at 12:20
  • look at this `#BrexitVote\nhttps:\/\/t.c\u2026` – Run Jun 25 '16 at 12:21
  • 1
    That's normal. `\n` is a new line, `\/` is a slash, `\u2026` just means it's a UTF-8 encoded character: http://www.fileformat.info/info/unicode/char/2026/index.htm This is called 'escaping'. `json_decode()` is supposed to correctly decode this. – KIKO Software Jun 25 '16 at 12:26
  • `json_decode() is supposed to correctly decode this` but it does not, doesn it? – Run Jun 25 '16 at 12:32
  • That was an assumption on my part... perhaps not. perhaps you need to do that yourself. – KIKO Software Jun 25 '16 at 12:33
  • even `https://t.cu2026` is not a working link at all. i still think that it is part of twitter bug. – Run Jun 25 '16 at 12:36
  • 1
    I doubt that Twitter has bugs like that. perhaps you need something like this: http://stackoverflow.com/questions/2934563 to decode the encoded `\u' characters? – KIKO Software Jun 25 '16 at 12:36

0 Answers0