Strip all whitespace

Question

I use curl to get the content of an website into a string. After that I want to stip all the whitespace. For that I use $content = preg_replace('/\s+/', '', $content);. But it doesn't work properly. What am I doing wrong?

I use this code to get the content:

$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, 'http://www.italiakalmar.se/ui/Article/show.aspx?id=185&m=165');
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($curl_handle);
curl_close($curl_handle);

$pos = stripos($content, "<body");
$content = substr($content, $pos);

$content = strip_tags($content);

$content = html_entity_decode($content, ENT_COMPAT, 'UTF-8');

$content = preg_replace('/\s+/', '', $content);

$content = mb_strtolower($content, 'utf-8');

echo $content = str_replace("–", "-", $content);

I then get this string: //fabrikenrestaurangenpizzerianintromenykvalitetallergihittatillosspizzeriaitaliapizzeriaitaliaÃ¶ppnadedÃ¶rrarnafÃ¶rstagÃ¥ngenredan1977,ochdrivssedandessisammamiljÃ¶ochsammakaraktÃ¤ristiskastil.viharalltidutsÃ¶ktapizzoraverkÃ¤ntgodsmakochkvalitet.komintillpizzeriaitaliaochlÃ¥tossserveradigenutsÃ¶ktpizza.elleromdetpassarbÃ¤ttre-lÃ¥tosslevereradenhemtilldig!nukanmanÃ¤venbetalamedkortvidutkÃ¶rning!Ã¶ppettider:mÃ¥n-torskl:15-21fredagÂ Â kl:15-22lÃ¶rdagÂ Â kl:12-22sÃ¶ndagÂ kl:12-21ingÃ¥rikalmarkrogar.se

As you can see the whitespace is still there.

must work because it works here. http://stackoverflow.com/questions/2109325/how-to-strip-all-spaces-out-of-a-string-in-php — Bhavin Rana, Jun 29 '12 at 08:02
Are you sure you want to strip all whitespaces? I think what you want is replacing multiple whitespaces into a *single* whitespace. — flowfree, Jun 29 '12 at 08:07
"After that I want to stip all the whitespace." I think we can read that as "strip all the whitespaces"... ;) and I don't see any flaw in that regex, checked the docs again but it should work: http://regexpal.com/?flags=g&regex=\s%2B&input=this%20is%20a%20%20%20dumm%20%20text — Simon, Jun 29 '12 at 08:13
Ye, all the whitespace should be removed :) And yes Simon, I also think it should work. But for some reason it doen't. If you check my edit you can see how I get the content — Daniel Tovesson, Jun 29 '12 at 08:32
Please edit your question to show relevant data and code. CURL has nothing to do with this. Make a `$content = ''` variable, show the code you use to trim, show the output and tell what you expect. — CodeCaster, Jun 29 '12 at 08:35
Yeah it would probably be pretty helpful if we can see what exact response you get from curl — Simon, Jun 29 '12 at 08:42
I'd like to see the $content right before curl_close(), so the original output of curl_exec(), I think it may have to do with the encoding... — Simon, Jun 29 '12 at 09:03
Thats a lot of content to post. The easiest way would probably be if you could test the code yourself. As you can se I have written the URL I am getting the content from. So the code is just a simple copy and paste to a test-file. Then you can see the result I get — Daniel Tovesson, Jun 29 '12 at 09:09
html_entity_decode is causing the trouble, it actually just didn't convert the entities... — Simon, Jun 29 '12 at 13:15
Yey! Solved it with this bit of code. $content = str_replace(html_entity_decode(" "), "", $content); Had something to do with the encoding. — Daniel Tovesson, Jul 06 '12 at 11:21

score 1 · Accepted Answer · answered Jun 29 '12 at 08:08

1

$content = str_replace(' ', '', $content);

No regex approach.

answered Jun 29 '12 at 08:08

miqbal

2,213
3
27
35

I know. But "/\s+/" or any other regex also don't work for U+2001, U+2028, U+2004, U+2005, U+2006, U+2007, U+200A. – miqbal Jun 29 '12 at 08:18
1

Why don't you create an array with everything you want to replace? Just do str_replace(array(),'',$content) – AntonioCS Jun 29 '12 at 08:36
@AntonioCS Yes it's exact solution. – miqbal Jun 29 '12 at 08:37

score 0 · Answer 2 · answered Jun 29 '12 at 08:25

0

$content = preg_replace('/\s+/', '', $content);

search for only one or first match

You can match all the whitespaces in given string $content by using this

$content = preg_replace('/\s+/g', '', $content);

you need to put "g" for global search in regular expression

You can test or even create regular expressions using this free online tool

http://www.gskinner.com/RegExr/

answered Jun 29 '12 at 08:25

Mudasser

315
3
10

I just get bool(false) when using $content = preg_replace('/\s+/g', '', $content); – Daniel Tovesson Jun 29 '12 at 08:29
the g modifier is no implemented in phps preg: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php, preg_replace has two parameter for controlling the amount of replacements – Simon Jun 29 '12 at 08:50

Strip all whitespace

2 Answers2