Removing the "\ufeff" from the end of object -> content in Google+ API json result

Question

The result from the Google+ API has \ufeff appended to the end of every "content" result (I don't really know why?)

What is the best way to remove this unicode character from the json result? It is producing a '?' in some of the output I am displaying.

Example:

https://developers.google.com/+/api/latest/activities/get#try-it

enter activity id

z12pvrsoaxqlw5imi22sdd35jwvkglj5204

and click Execute, result will be:

{
 .....
 "object": {
  ......
  "content": "CONTENT OF GOOGLE PLUS POST HERE \ufeff",
  ......

example PHP code which shows a '?' where the '\ufeff' is:

<?php
$data = json_decode($result_from_google_plus_api, true);
echo $data['object']['content'];
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
echo trim($data['object']['content']);
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"

Or am I going about this the wrong way? Should I be fixing the '?' issue rather than trying to remove the '\ufeff'?

In general, you can filter all invalid utf-8 characters by using [this answer](http://stackoverflow.com/a/11709412/1338292). — Ja͢ck, May 05 '14 at 02:26
@Jack except that `\ufeff` is valid UTF-8 and will not be caught by the answer you posted — mark, Sep 18 '14 at 12:58

score 10 · Answer 1 · answered Sep 18 '14 at 13:20

In your case, you could use this regexp:

$str = preg_replace('/\x{feff}$/u', '', $str);

That way you can exactly match that code point value and have it removed.

From my experience there are a lot more white-spacey-character you want to remove. From my experienced this works well for me:

# I like to call this unicodeTrim()
$str = preg_replace(
  '/
    ^
    [\pZ\p{Cc}\x{feff}]+
    |
    [\pZ\p{Cc}\x{feff}]+$
   /ux',
  '',
  $str
);

I found http://www.regular-expressions.info/unicode.html a pretty good resource about the fine details:

\pZ - match any kind of whitespace or invisible separator
\p{Cc} - match control characters
\x{feff} - match BOM

I've seen regex suggest to match \pC instead of \pCc, however this is dangerous because pC includes any code point to which no character has been assigned. I've had actual data (certain emojis or other stuff) being removed because of this.

But, YMMW, I cant' stress this.

Thanks mark! I'm a few weeks off getting back to this project, once I do I'll implement this regex and let you know how it goes :) cheers! — dtbaker, Sep 21 '14 at 00:42

score 1 · Answer 2 · answered Jan 16 '20 at 19:35

1

By Respect to All Answers

I test most of answers but finally find solution here: GitHub

$field = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $field);

answered Jan 16 '20 at 19:35

Eyni Kave

1,113
13
23

Removing the "\ufeff" from the end of object -> content in Google+ API json result

2 Answers2

Linked