3

I have a file that has the Mime-type: text/csv.

I want to iterate through it and make a bunch of string-manipulation.

I have a billion columns and rows. But a simple example:

Foo    Bar    Grød
------------------
1      2      3
4      5      6

If I just print the values out, straight away, without having done anything, then Laravel prints this (for the headers):

My code:

foreach( $headers as $entry ){
  dump( $entry );
}

Output

Foo
Bar
b"Grød"

Now that third line is the problem. It's a binary-string

But I want 'what's inside the b" and ". So I want an output like this:

Foo
Bar
Grød

If I just add utf8_decode, like this:

foreach( $headers as $entry ){
  dump( utf8_decode( $entry ) );
}

Foo
Bar
Gr?d

?!

How do I get the actual values from all rows that contains the danish æ, ø and å letters? It's a part of the standard UTF-8-encoding, so it should be rocket science.

Addition1

If I write: dd( $request['csv_file'] ), then it outputs this:

-test: false
-originalName: "FILENAME.csv"
-mimeType: "text/csv"
-error: 0
#hashName: null
path: "/private/var/folders/hl/r1syq9ys4z30lw08b6g8hhnh0000gn/T"
filename: "phpzYwY9I"
basename: "phpzYwY9I"
pathname: "/private/var/folders/hl/r1syq9ys4z30lw08b6g8hhnh0000gn/T/phpzYwY9I"
extension: ""
realPath: "/private/var/folders/hl/r1syq9ys4z30lw08b6g8hhnh0000gn/T/phpzYwY9I"
aTime: 2019-02-20 15:31:10
mTime: 2019-02-20 15:31:10
cTime: 2019-02-20 15:31:10
inode: 12891860254
size: 2282762
perms: 0100600
owner: 501
group: 20
type: "file"
writable: true
readable: true
executable: false
file: true
dir: false
link: false

I don't know if it's any help to anyone.

And if I (in the terminal) write file -I FILENAME.csv then it outputs: FILENAME.csv: application/octet-stream; charset=binary

Zeth
  • 2,273
  • 4
  • 43
  • 91
  • The `b""` is only a convention of that `dump` utility. There's no difference as far as PHP is concerned, PHP doesn't have anything called "binary strings". There's nothing really at all you need to do. If you want to figure out what the actual encoding of the string is, use `bin2hex` to look at its actual byte representation. – deceze Feb 20 '19 at 13:51
  • Interesting. Later on in the code, I do a 'in_array( 'Grød', $headers );` and that's the one that fails. And it fails regardsless if I do a utf8_decode or not. So there must be something with that crappy letter! :-) – Zeth Feb 20 '19 at 14:16
  • Stupid question, where are you dumping that value? Do you see **Gr?d** this in browser or somewhere else? – skywalker Feb 20 '19 at 14:24
  • In the browser, yes. So I write `dump( .... )` and reload the page (in the browser). – Zeth Feb 20 '19 at 14:28
  • 1
    If `in_array` doesn't find it, that means the encoding of the string in your PHP file (i.e., the encoding of the PHP file), and the encoding of the strings in the array are different. – deceze Feb 20 '19 at 14:31
  • Make sure that the PHP file is in the same encoding as the file. I've had something similar happen also with æøå... – JoSSte Feb 20 '19 at 14:45

2 Answers2

1

Try to change the encoding of the file to UTF-8. For conversion you could use a text editor like Notepad++ or Sublime Text. Convert the file to UTF-8 or better to UTF with BOM, save it and retry.

I suppose that, because of some character present in the file, php thinks that the file is encoded in an other charset (not UTF-8), or the file is really encoded in an other charset because of your database encoding or the way you got this csv file.

dparoli
  • 8,891
  • 1
  • 30
  • 38
  • 1
    Didn't work. :-/ If I (in the terminal) write: `file -I FILENAME.csv` then it outputs: `FILENAME.csv: application/octet-stream; charset=binary`. Perhaps that tells someone something. – Zeth Feb 20 '19 at 14:21
  • try to convert to UTF8 with BOM – dparoli Feb 20 '19 at 14:24
  • If I save using `UTF8 with BOM`, then it outputs: `?Grød`. :-/ – Zeth Feb 20 '19 at 14:26
0

This solved it for me. Using unpack( "a*", $entry );.

Thanks for your time everybody! Every little thing helped.

Zeth
  • 2,273
  • 4
  • 43
  • 91
  • 1
    I'd again advise to see what you're actually dealing with using `bin2hex()` before applying such mystery solutions. – deceze Feb 20 '19 at 14:51