0

I have data that looks like this:data

I want to put this data into a dataframe but I get the following errors:

  File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 790, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Here is my code:

Import necessary modules

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Import data into a dataframe
df = pd.read_csv("ETHData.csv")
df.head()

As suggested by some users I changed the df = pd.read_csv("ETHData.csv",encoding='latin1') and received this output:

0  NaN
1  NaN
2  NaN
3  NaN
4  NaN

Update:

Simply copying and pasting the data from a .txt format to a .csv format solved the problem. Here is the correct output now:

       DateTime  Price [USD]
0  7/30/15 0:00          0.0
1  7/31/15 0:00          0.0
2   8/1/15 0:00          0.0
3   8/2/15 0:00          0.0
4   8/3/15 0:00          0.0
  • 2
    use `pd.read_csv("ETHData.csv",encoding='latin1')` – pythonic833 Apr 16 '18 at 20:25
  • Look at this response https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – Turo Apr 16 '18 at 20:26
  • Could you give us the data in a format we can use to try to replicate your error? – ChootsMagoots Apr 16 '18 at 20:34

1 Answers1

0

The problem appears to be encoding related. The default for pandas is to decode text in utf-8 format, and your data likely requires english unicode (latin1). As @pythonic883 says, use:

pd.read_csv('ETHData.csv', encoding='latin1').

Sometimes iso-8859-1 is required.

pd.read_csv('ETHData.csv', 'encoding='iso-8859-1').

ChootsMagoots
  • 670
  • 1
  • 6
  • 19
  • I made an edit based on what you and others have suggested, I still get an incorrect output –  Apr 16 '18 at 20:30