0

I ran the following script (https://github.com/FXCMAPI/FXCMTickData/blob/master/TickData34.py) and added the following lines at the end to download the files:

    output_folder = '/Users/me/Documents/data/forex/'
    target_folder = os.path.join(output_folder, symbol, year)
    os.makedirs(target_folder, exist_ok=True)
    with open(os.path.join(target_folder, str(i) + '.csv'), 'wb') as outfile:
            outfile.write(data)

Then, I tried opening the file using pandas as follows:

x = pd.read_csv('/Users/me/Documents/data/forex/EURUSD/2015/29.csv')

However, this is what I got:

    In [3]: x.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2415632 entries, 0 to 2415631
    Data columns (total 3 columns):
    D             float64
    Unnamed: 1    float64
    Unnamed: 2    float64
    dtypes: float64(3)
    memory usage: 55.3 MB

    In [4]: x.dropna()
    Out[4]: 
    Empty DataFrame
    Columns: [D, Unnamed: 1, Unnamed: 2]
    Index: []

Why is the dataframe empty?

If I open the file on TextEdit, the first few lines actually look like this:

DateTime,Bid,Ask

07/19/2015 21:00:15.469,1.083,1.08332

07/19/2015 21:00:16.949,1.08311,1.08332

07/19/2015 21:00:16.955,1.08311,1.08338
Mariska
  • 1,913
  • 5
  • 26
  • 43
  • The dataframe is not empty until you drop the nulls. You need to use parse_dates = 'DateTime' – Keith Oct 13 '17 at 03:20

2 Answers2

1

Apparently, every character in your data is followed by the null character \x00. Get rid of them, and things will work:

outfile.write(data.replace(b'\x00',b''))
Ken Wei
  • 3,020
  • 1
  • 10
  • 30
0

Thank you for providing a very concrete and reproducible problem.

I pasted your code and run them in windows and it indeed just read in 55MB of null values.

But I think it is a problem of pandas not parsing the csv file correctly, not that it cannot open the csv file.

However, after I tried all the encoding listed in this answer, it simply didn't yield, so might be something wrong with the file as well.

How I eventually made it work is by opening it in excel and save as a different file, then pandas can parse it correctly.

PaulDong
  • 711
  • 7
  • 19