3

I'm dealing with some excessive white space that I want to remove. An example:

Envelopes/Env. Thick/Env. Thin      0 pages


Label      0 pages


Hagaki      0 pages



Replace Count


Drum Unit      0


Toner      0

I've tried to use preg_replace('/\s\s+/', ' ', $content); but the outcome is not what I expected. The output with preg_replace:
Envelopes/Env. Thick/Env. Thin 0 pages Label 0 pages Hagaki 0 pages Replace Count Drum Unit 0 Toner 0

What I want:

Envelopes/Env. Thick/Env. Thin 0 pages
Label 0 pages
Hagaki 0 pages
Replace Count Drum Unit 0
Toner 0

My code:

<?php

$cw=curl_init("http://192.168.1.135/printer/maininfo.html");
$txtfl=fopen("printermtpage.txt","w");

curl_setopt($cw, CURLOPT_FILE, $txtfl);
curl_setopt($cw, CURLOPT_HEADER, false);

curl_exec($cw);

curl_close($cw);

$file="printermtpage.txt";
$txtopentoread=fopen("printermtpage.txt","r");
$txtread=fread($txtopentoread,filesize($file));

$notags=strip_tags(html_entity_decode($txtread));
$remblanks=preg_replace('/\s\s+/', ' ', $notags);

fclose($txtfl);

?>
CSᵠ
  • 10,049
  • 9
  • 41
  • 64
Bilzard
  • 371
  • 1
  • 5
  • 19

2 Answers2

3

RegEx \s matches [\r\n\f\t\v ] and since you don't need newlines removed (or others in the class) you could use:

$remblanks=preg_replace('/[ \t]+/',' ',$notags);

Explained demo here: http://regex101.com/r/tS0vG7

Update

Advanced RegEx that strips 2+ whitespace characters:

preg_replace('/(?|([ \t]){2,}|(?:\r?(\n)){2,})/','\1',$notags);

Explained demo here: http://regex101.com/r/nU4fU2

CSᵠ
  • 10,049
  • 9
  • 41
  • 64
2

I think the problem is that \s matches newline characters (\n) as well. So you're converting your newlines to spaces, effectively putting them all on one line.

Try using \[:blank:\] to match only spaces and tabs.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328