0

I use curl to get the content of an website into a string. After that I want to stip all the whitespace. For that I use $content = preg_replace('/\s+/', '', $content);. But it doesn't work properly. What am I doing wrong?

I use this code to get the content:

$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, 'http://www.italiakalmar.se/ui/Article/show.aspx?id=185&m=165');
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($curl_handle);
curl_close($curl_handle);

$pos = stripos($content, "<body");
$content = substr($content, $pos);

$content = strip_tags($content);

$content = html_entity_decode($content, ENT_COMPAT, 'UTF-8');

$content = preg_replace('/\s+/', '', $content);

$content = mb_strtolower($content, 'utf-8');

echo $content = str_replace("–", "-", $content);

I then get this string: //fabrikenrestaurangenpizzerianintromenykvalitetallergihittatillosspizzeriaitaliapizzeriaitaliaöppnadedörrarnaförstagångenredan1977,ochdrivssedandessisammamiljöochsammakaraktäristiskastil.viharalltidutsöktapizzoraverkäntgodsmakochkvalitet.komintillpizzeriaitaliaochlåtossserveradigenutsöktpizza.elleromdetpassarbättre-låtosslevereradenhemtilldig!nukanmanävenbetalamedkortvidutkörning!öppettider:mån-torskl:15-21fredag  kl:15-22lördag  kl:12-22söndag kl:12-21ingårikalmarkrogar.se

As you can see the whitespace is still there.

Daniel Tovesson
  • 2,550
  • 1
  • 30
  • 41
  • must work because it works here. http://stackoverflow.com/questions/2109325/how-to-strip-all-spaces-out-of-a-string-in-php – Bhavin Rana Jun 29 '12 at 08:02
  • Are you sure you want to strip all whitespaces? I think what you want is replacing multiple whitespaces into a *single* whitespace. – flowfree Jun 29 '12 at 08:07
  • "After that I want to stip all the whitespace." I think we can read that as "strip all the whitespaces"... ;) and I don't see any flaw in that regex, checked the docs again but it should work: http://regexpal.com/?flags=g&regex=\s%2B&input=this%20is%20a%20%20%20dumm%20%20text – Simon Jun 29 '12 at 08:13
  • Ye, all the whitespace should be removed :) And yes Simon, I also think it should work. But for some reason it doen't. If you check my edit you can see how I get the content – Daniel Tovesson Jun 29 '12 at 08:32
  • Please edit your question to show relevant data and code. CURL has nothing to do with this. Make a `$content = ''` variable, show the code you use to trim, show the output and tell what you expect. – CodeCaster Jun 29 '12 at 08:35
  • Yeah it would probably be pretty helpful if we can see what exact response you get from curl – Simon Jun 29 '12 at 08:42
  • I'd like to see the $content right before curl_close(), so the original output of curl_exec(), I think it may have to do with the encoding... – Simon Jun 29 '12 at 09:03
  • Thats a lot of content to post. The easiest way would probably be if you could test the code yourself. As you can se I have written the URL I am getting the content from. So the code is just a simple copy and paste to a test-file. Then you can see the result I get – Daniel Tovesson Jun 29 '12 at 09:09
  • html_entity_decode is causing the trouble, it actually just didn't convert the entities... – Simon Jun 29 '12 at 13:15
  • Yey! Solved it with this bit of code. $content = str_replace(html_entity_decode(" "), "", $content); Had something to do with the encoding. – Daniel Tovesson Jul 06 '12 at 11:21

2 Answers2

1
$content = str_replace(' ', '', $content);

No regex approach.

miqbal
  • 2,213
  • 3
  • 27
  • 35
0
$content = preg_replace('/\s+/', '', $content);

search for only one or first match

You can match all the whitespaces in given string $content by using this

$content = preg_replace('/\s+/g', '', $content);

you need to put "g" for global search in regular expression

You can test or even create regular expressions using this free online tool

http://www.gskinner.com/RegExr/

Mudasser
  • 315
  • 3
  • 10
  • I just get bool(false) when using $content = preg_replace('/\s+/g', '', $content); – Daniel Tovesson Jun 29 '12 at 08:29
  • the g modifier is no implemented in phps preg: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php, preg_replace has two parameter for controlling the amount of replacements – Simon Jun 29 '12 at 08:50