Problem: Parsing a CSV file client side with Javascript.
First question is what kind of encoding is this ? The following is the result of executing the command:
cat file.csv | xxd
The file is not complete here, you only see the header line and the beginning of the second line.
0000000: 4500 2d00 6d00 6100 6900 6c00 2000 6100 E.-.m.a.i.l. .a.
0000010: 6400 7200 6500 7300 3b00 5200 6f00 6500 d.r.e.s.;.R.o.e.
0000020: 7000 6e00 6100 6100 6d00 3b00 4100 6300 p.n.a.a.m.;.A.c.
0000030: 6800 7400 6500 7200 6e00 6100 6100 6d00 h.t.e.r.n.a.a.m.
0000040: 3b00 4300 7200 6500 6400 6900 7400 6500 ;.C.r.e.d.i.t.e.
0000050: 7500 7200 6e00 7500 6d00 6d00 6500 7200 u.r.n.u.m.m.e.r.
0000060: 3b00 4700 6f00 6500 6400 6b00 6500 7500 ;.G.o.e.d.k.e.u.
0000070: 7200 6400 6500 7200 7300 3b00 4600 7500 r.d.e.r.s.;.F.u.
0000080: 6e00 6300 7400 6900 6500 3b00 4b00 6f00 n.c.t.i.e.;.K.o.
0000090: 7300 7400 6500 6e00 7000 6c00 6100 6100 s.t.e.n.p.l.a.a.
00000a0: 7400 7300 3b00 4200 6500 6800 6500 6500 t.s.;.B.e.h.e.e.
00000b0: 7200 6400 6500 7200 3b00 4400 6500 6300 r.d.e.r.;.D.e.c.
00000c0: 6c00 6100 7200 6100 6e00 7400 3b00 4700 l.a.r.a.n.t.;.G.
00000d0: 6f00 6500 6400 6b00 6500 7500 7200 6400 o.e.d.k.e.u.r.d.
00000e0: 6500 7200 3b00 4500 7800 7000 6f00 7200 e.r.;.E.x.p.o.r.
00000f0: 7400 6500 7500 7200 3b00 4700 6500 6100 t.e.u.r.;.G.e.a.
0000100: 6300 7400 6900 7600 6500 6500 7200 6400 c.t.i.v.e.e.r.d.
0000110: 3b00 5000 6500 7200 7300 6f00 6e00 6500 ;.P.e.r.s.o.n.e.
0000120: 6500 6c00 7300 6e00 7500 6d00 6d00 6500 e.l.s.n.u.m.m.e.
0000130: 7200 3b00 4700 6500 6200 7200 7500 6900 r.;.G.e.b.r.u.i.
0000140: 6b00 6500 7200 7300 6e00 6100 6100 6d00 k.e.r.s.n.a.a.m.
0000150: 3b00 5500 3300 2000 2000 2000 2000 2000 ;.U.3. . . . . .
0000160: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
0000170: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
0000180: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
0000190: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001a0: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001b0: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001c0: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001d0: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001e0: 2000 2000 2000 2000 2000 2000 2000 2000 . . . . . . . .
00001f0: 2000 2000 2000 2000 2000 2000 2000 0d00 . . . . . . ...
0000200: 0a00 4100 2e00 4a00 4100 4e00 5300 4500 ..A...J.A.N.S.E.
To parse the file we want to be able to loop over each line. To do that we use the following regex:
lines = str.match(/[^\r\n]+/g)
The result looks like that:
['...\u0000', '\u0000', '\u0000...']
But it should actually look like that:
['... ', 'A...']
If the file is not the problem, what regex can I use to not have the null bytes "breaking" the regex.
Edit:
- Executing
file -I
returnsapplication/octet-stream; charset=binary