0

I have the following array:

[0] => Array
        (
            [0] => 3,38 m
            [1] => 13,30 s
            [2] => 5,41 m
            [3] => ESE
            [4] => 294º
            [5] => 32,76 km/h
            [6] => W
            [7] => 266º
            [8] => 16,27 ºC
            [9] => 12,80 ºC
            [10] => 0
        )

I'm wanting to clean up the data before adding it to a DB.

This function is almost there but does not remove the special characters:

function cleanUp(&$value,$key)
{
    $cleaner2 = array("km/h"," ","m","s","º","ºC");
    $value = str_replace($cleaner2, "", $value);
}
array_walk($newArray[0],"cleanUp");

I've looked into encoding the array, but I'm not sure what encoding it has now? I could trim the array values, but feel that is rather inelegant.

Any ideas?

The solution: I omitted the charset from the header!

header('Content-type: application/json; charset=UTF-8');

This allowed my simple cleanUp function to work as it removed the  and then matched the following cleaner2 array values:

$cleaner2 = array("km/h"," ","m","s","º","ºC","C");
squeaker
  • 395
  • 2
  • 7
  • 17
  • What encoding is your file saved in, and what encoding is it served in? – Esailija Nov 11 '12 at 09:06
  • Where are those characters coming from to begin with? Looks like you need to *handle encodings correctly*, not clean up the mess after its broken. [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) – deceze Nov 11 '12 at 09:10
  • do you just want numbers and commas? – Nick Maroulis Nov 11 '12 at 09:11
  • I'm using google docs as a scraper which I'm then grabbing as a csv. Yes I just want the numbers, commas and latin letters. – squeaker Nov 11 '12 at 09:14
  • I would love to share Joel's http://www.joelonsoftware.com/articles/Unicode.html :) – verisimilitude Nov 11 '12 at 09:17

2 Answers2

1

You can try

$data = array(
  0 => '3,38 m',
  1 => '13,30 s',
  2 => '5,41 m',
  3 => 'ESE',
  4 => '294º',
  5 => '32,76 km/h',
  6 => 'W',
  7 => '266º',
  8 => '16,27 ºC',
  9 => '12,80 ºC',
  10 => 0,
);

$c =  array("km/h"," ","m","s","º","ºC");
$data = array_map(function($v)use($c) {return mb_replace($c,"",$v);},$data);
var_dump($data);

Output

array (size=11)
  0 => string '3,38' (length=4)
  1 => string '13,30' (length=5)
  2 => string '5,41' (length=4)
  3 => string 'ESE' (length=3)
  4 => string '294' (length=3)
  5 => string '32,76' (length=5)
  6 => string 'W' (length=1)
  7 => string '266' (length=3)
  8 => string '16,27C' (length=6)
  9 => string '12,80C' (length=6)
  10 => string '0' (length=1)

Function Use

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

Function Credit : Gumbo

Community
  • 1
  • 1
Baba
  • 94,024
  • 28
  • 166
  • 217
  • Thx baba, Your code only works when run on a static array. When I run it against the array generated from the CSV file it still does not remove the ºC characters? – squeaker Nov 11 '12 at 09:46
  • Depends on how you are reading the CSV .... See http://stackoverflow.com/a/13324952/1226894 – Baba Nov 11 '12 at 09:48
  • @Baba The problem is not multi-byte vs. single byte string function, it's just an encoding mismatch. The replacement simply happens on a raw byte level, the `str_replace` function does not necessarily need to be encoding aware. Only in encodings where byte sequences are ambiguous do you need an encoding aware "mb_replace" function, but I'm not even sure what typically used encoding has this property. – deceze Nov 11 '12 at 10:04
  • HI Guys. Baba, you put me on the right track with the link as I noticed that I had set the charset in the header. Setting this to UTF-8 got rid of the Â, following that my simple cleanUp function worked with the addition of ("º","ºC"). thanks for your time - much appreciated. – squeaker Nov 11 '12 at 10:09
0
function __clean( $text )
{
    $buff = implode('[:#:]',$text);
    $buff = str_replace("","UNICODE OF $text",$buff);
    return ($buff); 
}