2
import pandas as pd
import numpy as np
from datetime import datetime, time


# history file and batch size for processing.

historyFilePath = 'EURUSD.SAMPLE.csv'
batch_size = 5000


# function for date parsing
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')


# load data into a pandas iterator with all the chunks
ratesFromCSVChunks = pd.read_csv(historyFilePath, index_col=0, engine='python', parse_dates=True,
                                 date_parser=dateparse, header=None,
                                 names=["datetime", "1_Current", "2_BidPx", "3_BidSz", "4_AskPx", "5_AskSz"],
                                 iterator=True,
                                 chunksize=batch_size)



# concatenate chunks to get the final array
ratesFromCSV = pd.concat([chunk for chunk in ratesFromCSVChunks])

# save final csv file
df.to_csv('EURUSD_processed.csv', date_format='%Y-%m-%d %H:%M:%S.%f',
             columns=['1_Current', '2_BidPx', '3_BidSz', '4_AskPx', '5_AskSz'], header=False, float_format='%.5f')

I am reading a CSV file containing forex data in the format

    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:02.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:02.000000,1.33900,1.33900,1.00000,1.33940,1.00000
    2014-08-17 17:00:04.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:05.000000,1.33930,1.33930,1.00000,1.33950,1.00000
    2014-08-17 17:00:06.000000,1.33920,1.33920,1.00000,1.33960,1.00000
    2014-08-17 17:00:06.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:08.000000,1.33900,1.33900,1.00000,1.33942,1.00000
    2014-08-17 17:00:16.000000,1.33900,1.33900,1.00000,1.33940,1.00000

How do you convert from Datatime in the CSV file or pandas dataframe being read to EPOCH time in MILLISECONDS from MIDNIGHT ( UTC or localized ) by the time it is being saved. Each file Starts at Midnight every day . The only thing being changed is the format of datetime to miilliseconds from midnight every day( UTC or localized) . The format i am looking for is:

    43264234, 1.33910,1.33910,1.00000,1.33930,1.00000
    43264739, 1.33910,1.33910,1.00000,1.33950,1.00000
    43265282, 1.33910,1.33910,1.00000,1.33930,1.00000
    43265789, 1.33900,1.33900,1.00000,1.33940,1.00000
    43266318, 1.33910,1.33910,1.00000,1.33950,1.00000
    43266846, 1.33930,1.33930,1.00000,1.33950,1.00000
    43267353, 1.33920,1.33920,1.00000,1.33960,1.00000
    43267872, 1.33910,1.33910,1.00000,1.33950,1.00000
    43268387, 1.33900,1.33900,1.00000,1.33942,1.00000

Any help is well appreciated ( short & precise in Python 3.5 or Python 3.4 and above with Pandas 0.18.1 and numpy 1.11 )

JourneyMan
  • 117
  • 5
  • 12

2 Answers2

3

This snippet of code should be what you want

# Create some fake data, similar to yours

import pandas as pd
s = pd.Series(pd.date_range('2014-08-17 17:00:01.1230000', periods=4))
print(s)
print(type(s[0]))

# Create a new series using just the date portion of the original data.
# This effectively truncates the time portion. 
# Can't use d = s.dt.date or you'll get date objects back, not datetime64.

d = pd.to_datetime(s.dt.date)
print(d)
print(type(d[0]))

# Calculate the time delta between the original datetime and 
# just the date portion. This is the elapsed time since your epoch.

delta_t = s-d
print(delta_t)

# Display the elapsed time as seconds.

print(delta_t.dt.total_seconds())

This results in the following output

0   2014-08-17 17:00:01.123
1   2014-08-18 17:00:01.123
2   2014-08-19 17:00:01.123
3   2014-08-20 17:00:01.123
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0   2014-08-17
1   2014-08-18
2   2014-08-19
3   2014-08-20
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0   17:00:01.123000
1   17:00:01.123000
2   17:00:01.123000
3   17:00:01.123000
dtype: timedelta64[ns]
0    61201.123
1    61201.123
2    61201.123
3    61201.123
dtype: float64
Tom Johnson
  • 1,793
  • 1
  • 13
  • 31
0

Here's how I did it with my data:

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2011', periods=72, freq='H')
df = pd.DataFrame({"Data": np.random.randn(len(rng))}, index=rng)
df["Time_Since_Midnight"] = (df.index - pd.to_datetime(df.index.date)) / np.timedelta64(1, 'ms')

By converting the DateTimeIndex into a date object, we drop off the hours and seconds. Then by taking the difference of the two, you get a timedelta64 object, which you can then format into milliseconds.

Here's the output I get (the last column is the time since midnight):

2011-01-01 00:00:00  2.383501         0.0
2011-01-01 01:00:00  0.725419   3600000.0
2011-01-01 02:00:00 -0.361533   7200000.0
2011-01-01 03:00:00  2.311185  10800000.0
2011-01-01 04:00:00  1.596148  14400000.0
Anjum Sayed
  • 872
  • 9
  • 20