7

My original CSV file has float64 values in each block but after I use pd.csv() to read the file, it returns me a blank table. I tried to set the delimiter and the encoding of the function but it didn't help at all. The CSV file is automatically generated by a software and I have no way check the settings of the settings. Is there any way I can read my file to a dataframe with correct values?

>>> pd.read_csv('./HISTORY_LOG_05-31-2018.CSV')
D  Unnamed: 1  Unnamed: 2      ...       Unnamed: 108  Unnamed: 109  Unnamed: 110
0 NaN         NaN         NaN      ...                NaN           NaN           NaN
1 NaN         NaN         NaN      ...                NaN           NaN           NaN
2 NaN         NaN         NaN      ...                NaN           NaN           NaN
3 NaN         NaN         NaN      ...                NaN           NaN           NaN
4 NaN         NaN         NaN      ...                NaN           NaN           NaN
5 NaN         NaN         NaN      ...                NaN           NaN           NaN

[6 rows x 111 columns]

I simplified the CSV file to

A,B
0.000,0.000

0.000,0.000

and I still got results like:

>>> pd.read_table('./HISTORY_LOG_05-31-2018.CSV', encoding="cp1252")
    D
0 NaN
1 NaN
2 NaN

>>> pd.read_table('./HISTORY_LOG_05-31-2018.CSV', encoding="cp1252", delimiter=",")
    D  Unnamed: 1
0 NaN         NaN
1 NaN         NaN
2 NaN         NaN
  • 2
    Without seeing a sample of your input file, this is impossible to debug. – abarnert Jun 21 '18 at 18:07
  • Also, just saying "I tried to set the delimiter and the encoding of the function" doesn't help—you have to tell us exactly what you set them to. Or, better, just give us the code that you thought would work (and, if it isn't obvious, why you thought it would work). – abarnert Jun 21 '18 at 18:07
  • I updated my question and I hope you will be able to see where the problem is :) – オウエキセツ Jun 21 '18 at 18:20
  • Worked with `pd.read_csv('PATH_TO_FILE')` for me. I used your CSV example and `pandas` v0.22. – Luiz Otavio V. B. Oliveira Jun 21 '18 at 21:16

3 Answers3

22

So, I figured out the answer as I had this same problem. My encoding was wrong and so it wouldn't read the text correctly. I opened it in Visual Studio Code and found the encoding was UTF-16 LE. My output came from powershell so yours likely did too and you probably just need to specify the output encoding or change the encoding for panda.

pd.read_csv("ADSearch.txt",encoding='UTF-16 LE')
Empty DataFrame
Columns: [lastname, firstname, username, site, email, Unnamed: 5, False, True]
Index: []
Christopher
  • 331
  • 2
  • 5
2

I found this issue and here are some steps to diagnostic:

First check with command line if the file is human readable:

head file.txt

After that in python3 console try to print some lines:

with open("file.txt", encoding="latin1", errors='ignore') as f:
    for i in f:
        print([str(i.strip())])

If you see lines in hex format, i.e. \x00N\x00A\x00S\x00S\x00A\x00U\x00"\x00; indicates that there are null chars in the source file. So to remove them just sed -i 's/\x0//g' file.txt as stated here and load the file in python again.

charles
  • 263
  • 5
  • 7
0

Works perfectly with the sample input that you have given

Sample input also shown

Version of Python and pandas also shown

~ $ python
Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_csv('sample.csv')
     A    B
0  0.0  0.0
1  0.0  0.0
>>> pd.__version__
'0.22.0'
>>> exit()
~ $ cat sample.csv 
A, B
0.000, 0.000
0.000, 0.000
sky_lynx
  • 131
  • 1
  • 8