0

I am reading a very Big CSV file from pandas in return when I am checking the output of the file it is throwing some absurd result:

projects=pd.read_csv('Projects.csv', sep='delimiter', header=None,engine='python',encoding='latin-1',skiprows=2)

I have tried many conditions by changing the encoding method and many of the other conditions still, I am receiving the same output

    0
0   æ®&ý%|ÈÍêc7*Àòç¯Nãç&ûãßöû3ü·oäAÏ6Å6o£¤...
1   aTݤã[ÓÓ:λq8ÝïJçÝpG­Ô¨ñað¢@·-,éD¿¨...
2   nªz¤/Âz?ã·Á|ø0v³¾R?3 CÓë_æàßv...
3   GPÃóNÝHÝèÆ¡Gár#Ý
4   GÍÅ9âÂQ²?8;)MÏ`5ÀôÚL3íºãÒÖõð­aßãÂÔ...
5   <ÈÞ-ܹ޽¥Æ£¯»àÏÝ}·ÇÒÃpo»ã¾ë5ÝBù{}~þô...
6   _wü-H|gw¦wò4ÉùÃ5nnÔÃo°ºnn`½³÷¶^...
7   ù4iZÓYU{=ó'ͥ羷xüé¢ÁüURnÂÕ«Ý=nû®...
8   gT:5ݾ,ãC7àzÞÃ)E;îîÙ³'$üÃAÀ
9   F:i¸Í-IÅX¾ÒÃxß)éx{ï`0%¬ì2û70aàÖ±...
10  ^o+=7|5ÊØ`ø~@ýLÀÛ5YSvú÷t<gcxÃpåv»ûÇï...

It would be great help If I get it resolved!

  • 1
    Probably have the wrong encoding. – James Jun 05 '20 at 10:48
  • use [this answer](https://stackoverflow.com/a/45167602/9375102) to detect your encoding then pass that in ass an argument – Umar.H Jun 05 '20 at 10:51
  • @Datanovice the encoding is ANSI I have tried this as an argument, still doesn't solve the issue – Adi_Sharma Jun 05 '20 at 11:04
  • Are you sure this isn't a binary file with a .csv extension? (doen't look like xls or xlsx but maybe some other type?). Sure you tried UTF-16 LE and BE? --- Oh, I see you skipped the first 2 rows.. so it could be xls or xlsx maybe. – Danny_ds Jun 05 '20 at 12:56

1 Answers1

0

This is probably a binary file (.xls or .xlsx) with a .csv extension.

There's only one 'field' in each row, so no delimiter is found.

Since you skipped the first 2 rows, I can't check the file signature to be sure.


Looking at the histogram of the small data sample you provided, it looks like this is compressed data and probably a .xlsx file (which is actually a zip file).

Danny_ds
  • 11,201
  • 1
  • 24
  • 46