0

I read an XLSX with simpleXlsx parser plugin.

my first line of excel is the header and i need to read it.

In my excel i have for example 3 columns with the name of the header on first row:

Columns_1      Columns_with_accent_à       Columns_3

Second col has an a accented: à

My editor is in UTF-8 mode, my php page has encoding UTF-8 set, i don't use any html on my page (is it a page import only in php) but I get this var dump:

<?php

header('Content-type: text/html; charset=UTF-8');

$xlsx = SimpleXLSX::parse("file.xlsx");

foreach( $xlsx->rows() as $indexrow => $r ) {   

        if ( $indexrow == 0 ) {

            // HEADER

            var_dump(strtolower($r[1])); //second column

            //output WRONG:     Columns_with_accent_�

        }
}

?>

Any idea strtolower broke my string? without it, it work great

Giuseppe Lodi Rizzini
  • 1,045
  • 11
  • 33
  • What is the encoding for the .xlsx file? – Funk Forty Niner Jan 29 '20 at 15:36
  • @FunkFortyNiner where I can find this info? – Giuseppe Lodi Rizzini Jan 29 '20 at 15:38
  • You can probably open the file in a code editor to see how it was saved. There's probably another way but it's not coming to mind for me right now. – Funk Forty Niner Jan 29 '20 at 15:39
  • @FunkFortyNiner ok opened with notepad++ and I see MACINTOSH (CR) - ANSI – Giuseppe Lodi Rizzini Jan 29 '20 at 15:40
  • For some reason, my browser is preventing me to include `�` in a Google search. I'd try and Google this to see what you can find. I'd help you but I can't right now. Try and save a copy of that xlsx file then save it as UTF-8 and see what results you get back. – Funk Forty Niner Jan 29 '20 at 15:46
  • @FunkFortyNiner after convert to UTF-8, still same result, and EXCEL can't read the file now.... – Giuseppe Lodi Rizzini Jan 29 '20 at 15:50
  • @FunkFortyNiner i found the problem.... in my example i've not write a little conversion used to my string.... i use strtolower... with strtolower i get wrong character, without conversion it is correct... i don't know why. i 've update my post with new info – Giuseppe Lodi Rizzini Jan 29 '20 at 15:53
  • Hmm... interesting. You should update your question to contain exactly what you used and where/how. That way, others might know the reason and possibly provide an answer/solution. Edit: I see you edited, ok. – Funk Forty Niner Jan 29 '20 at 15:54
  • If you've narrowed it down to this one line, you can forget about all the rest and provide a simple reproducible example with something like `echo $str, strtolower($str);` And tell us exactly what `$str` is, preferably with `bin2hex($str)` or such so we can be sure how it's encoded. – deceze Jan 29 '20 at 15:59

1 Answers1

2

Seeing the comment and edit (was not shown in the original post) about the use of strtolower(), the manual states:

Note that 'alphabetic' is determined by the current locale. This means that e.g. in the default "C" locale, characters such as umlaut-A (Ä) will not be converted.

mb_strtolower() on the other hand, shows:

By contrast to strtolower(), 'alphabetic' is determined by the Unicode character properties. Thus the behaviour of this function is not affected by locale settings and it can convert any characters that have 'alphabetic' property, such as A-umlaut (Ä).

Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141