0

I am getting two different count values for same string

 $val1 = $data->sheets[0]['cells'][5][1];//thise value is from excel sheet 
 $val2 = $obj->number; // this is the value from database

 var_dump(trim($val1)) , var_dump(trim($val2))

string(19) "US5922335A" ,  string(10) "US5922335A" 

first values is getting from excel sheet(xls)and The second value is am getting from Mysql table after performing Select query , i am unable to perform comparision like as both are having different length .

if($val1==$va2)
{
    echo "Found it ";   
}

How can i make it to have a same length so that i can compare it.

hakre
  • 193,403
  • 52
  • 435
  • 836
Afsar
  • 3,104
  • 2
  • 25
  • 35
  • 1
    Post a little more code, the code which grabs those values – Hanky Panky Jan 09 '13 at 06:55
  • i have added a little more code – Afsar Jan 09 '13 at 07:21
  • What codepage is your Excel file? – Mark Baker Jan 09 '13 at 07:28
  • @MarkBaker i am using PHPExcel's IOFactory.php file to generate xls file from excel , i am unable to figure out what codepage it is using – Afsar Jan 09 '13 at 09:06
  • @MarkBaker, how would i come to know about what codepage it is using – Afsar Jan 09 '13 at 09:16
  • If you're loading the file via PHPExcel, then the value from the spreadsheet should be UTF-8 (it handles codepage conversion in the file load). As such, I'd expect an exact character match. The difference in length suggests that codepage conversion isn't being done.... what filetype does the PHPExcel_IOFactory identify() method return for this file? – Mark Baker Jan 09 '13 at 09:22
  • @MarkBaker i am using http://stackoverflow.com/questions/3874840/csv-to-excel-conversion of one of your post but codepage conversion is not happening , what should i have to do to make it UTF-8 compatible – Afsar Jan 09 '13 at 11:00
  • If this is loaded from a CSV, then there isn't automatic codepage conversion because a csv contains nothing to identify the codepage.... you need to specify it yourself to the reader yourself using the setInputEncoding() method after instantiating the reader but before loading – Mark Baker Jan 09 '13 at 11:22
  • thankq Mark, can you please add that setInputEncoding() to your existing post http://stackoverflow.com/questions/3874840/csv-to-excel-conversion , m getting confused where to put it thanks in advance – Afsar Jan 09 '13 at 11:26
  • I've added modifying the reader configuration to that linked answer: if this CSV was generated using Excel, then the most likely encoding is UTF-16LE, which my example now uses – Mark Baker Jan 09 '13 at 12:30
  • Thanks Mark you too have a point. – Afsar Jan 09 '13 at 13:59

2 Answers2

4

Not a direct answer to your question but let's first find out what's really behind that string(19) "US5922335A". Please try

foreach(array($val1,$val2) as $v ) {
    for($i=0; $i<strlen($v); $i++) {
        printf('%02x ', ord($v[$i]));
    }
    echo "<br />\n";
}

if($val1==$val2) {
    echo "Found it ";   
}

and add the output which should be something like

55 53 35 39 32 32 33 33 35 41 <br />
55 53 35 39 32 32 33 33 35 41 <br />

to your question.


edit: Your data seems to be little endian UCS-2/UTF-16 encoded.

i am getting as
string(22) "EP2153814A1" string(10) "US5922335A"
45 00 50 00 32 00 31 00 35 00 33 00 38 00 31 00 34 00 41 00 31 00
55 53 35 39 32 32 33 33 35 41

Since your data source for that string value is an Excel sheet the easiest solution is probably to use mb_convert_encoding() to change the encoding.

$val1 = $data->sheets[0]['cells'][5][1];
$val1 = mb_convert_encoding($val1, 'ISO-8859-1', 'UTF-16LE');
VolkerK
  • 95,432
  • 20
  • 163
  • 226
  • i am getting as string(22) "EP2153814A1" string(10) "US5922335A" 45 00 50 00 32 00 31 00 35 00 33 00 38 00 31 00 34 00 41 00 31 00 55 53 35 39 32 32 33 33 35 41 – Afsar Jan 09 '13 at 07:25
  • That looks like little endian UTF-16 or UCS-2. edit: oh, your data source for $val1 is excel. Yes, Windows and Office use UTF16/UCS-2LE as their internal encoding. – VolkerK Jan 09 '13 at 07:38
  • I am using utf8_general_ci as a collation in database for that column do i need to change it to utf16_general_ci – Afsar Jan 09 '13 at 07:45
  • No, yes, no... There is more then one charset setting involved on the mysql side of this problem. Chaning the internal storage encoding alone wouldn't change anything. Your script retrieves values encoded according to the connection charset, that would be the one you'd have to change, e.g. via http://docs.php.net/manual/en/mysqli.set-charset.php - But keep in mind that there are two different "flavours" of utf-16 encoding, little endian and big endian. Let's take the character `E`, in ISO-8859 it's encodided as the single byte 0x45, UTF-16BE -> `0x00 0x45`, UTF-16LE -> `0x45 0x00`, you need LE – VolkerK Jan 09 '13 at 07:51
  • Thanks i got some concepts from your answer, and have changed the xls file to UTF-8 charset manually and it works now, thanks – Afsar Jan 09 '13 at 13:39
3

I will bet that the Mysql table defines the column as a char and not a varchar. If so, it means that the first value has trailing spaces.

You need to remove trailing spaces before doing the equality check.

if(trim($val1)==$va2)
{
    echo "Found it ";   
}
Richard Schneider
  • 34,944
  • 9
  • 57
  • 73
  • 1
    `$val1.rstrip()` that would be python ;-) `rtrim($val1)` – VolkerK Jan 09 '13 at 07:17
  • thanks for replying , After trimming i am getting that values . , while reading from excel cell i am getting count as 19 , but from database i am getting count as 10, please help me i need to do comparision – Afsar Jan 09 '13 at 07:35