1

I am trying to read this file using read_csv in pandas(python). But I am not able to capture all columns. Can you help?

Here is the code:

file = r'path of file'
df = pd.read_csv(file, encoding='cp1252', on_bad_lines='skip')
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
rish
  • 101
  • 9
  • What exactly do you mean by "not able to capture all columns"? What's the expected result? What result are you actually getting? What's the difference between the two? – ForceBru Sep 19 '22 at 11:11
  • if you open the file in excel or notepad++ you will see that there are 161 columsn and the code can capture only 7 – rish Sep 19 '22 at 11:15

1 Answers1

1

I tried to read your file, and I first noticed that the encoding you specified does not correspond to the one used in your file. I also noticed that the separator is not a comma (,) but a tab (\t).

First, to get the file encoding (in linux), you just need to run:

$ file -i kopie.csv 
kopie.csv: text/plain; charset=utf-16le

In Python:

import pandas as pd

path_to_file = 'kopie.csv'
df = pd.read_csv(path_to_file, encoding='utf-16le', sep='\t')

And when I print the shape of the loaded dataframe:

>>> df.shape
(869, 161)
inarighas
  • 720
  • 5
  • 24