Pandas DataFrame and Series - IB TWS HistoricalData

Question

I am trying to apply the pandas module to my code in order to re-organize the messages received back from IB TWS server.

The code is

from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract


class MyWrapper(EWrapper):

    def nextValidId(self, orderId:int):
        print("Setting nextValidOrderId: %d", orderId)
        self.nextValidOrderId = orderId
        self.start()

    def historicalData(self, reqId, bar):
        print("HistoricalData. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount)

    def historicalDataUpdate(self, reqId, bar):
        print("HistoricalDataUpdate. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount)

    def error(self, reqId, errorCode, errorString):
        print("Error. Id: " , reqId, " Code: " , errorCode , " Msg: " , errorString)

    def start(self):
        queryTime = ""

        contract = Contract()
        contract.secType = "STK"
        contract.symbol = "NIO"
        contract.currency = "USD"
        contract.exchange = "SMART"

        app.reqHistoricalData(1, contract, queryTime, "1 D", "5 secs", "TRADES", 0, 1, True, [])

app = EClient(MyWrapper())
app.connect("127.0.0.1", 7496, clientId=123)
app.run()

This code retrives historical data for a given stock, then returns the most current updates.

The problem that I am facing is that the messages returned are organized as such

HistoricalDataUpdate.  1 Date: 20200708  08:31:00 Open: 14.17 High: 14.17 Low: 14.17 Close: 14.17 Volume: -1 Average: 14.15 Count: -1

While I am trying to retrieve the data in a re-organized manner such as

HistoricalDataUpdate.  1 Date:            Open:  High:  Low:   Close:  Volume:  Average:  Count:
                       20200708  08:31:00 14.17  14.17  14.17  14.17   -1       14.15     -1

Help would be appreciated.

brian · Answer 1 · 2020-07-09T17:28:41.467

3

The callback gives you ibapi.common.BarData which you can read it's vars to get a dict like {date:..., open:123...} etc.

Pandas can make a dataframe from a list of dicts so store them in a list

Maybe you want date as an index, pandas can do that as well, surprisingly it can read the format.

You can save the data when you are done in a csv file.

from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
import pandas as pd

class MyWrapper(EWrapper):
    def __init__(self):
        self.data = []
        self.df=None
        
    def nextValidId(self, orderId:int):
        print("Setting nextValidOrderId: %d", orderId)
        self.nextValidOrderId = orderId
        self.start()

    def historicalData(self, reqId, bar):
        self.data.append(vars(bar));
        
    def historicalDataUpdate(self, reqId, bar):
        line = vars(bar)
        # pop date and make it the index, add rest to df
        # will overwrite last bar at that same time
        self.df.loc[pd.to_datetime(line.pop('date'))] = line
        
    def historicalDataEnd(self, reqId: int, start: str, end: str):
        print("HistoricalDataEnd. ReqId:", reqId, "from", start, "to", end)
        self.df = pd.DataFrame(self.data)
        self.df['date'] = pd.to_datetime(self.df['date'])
        self.df.set_index('date', inplace=True)
        
    def error(self, reqId, errorCode, errorString):
        print("Error. Id: " , reqId, " Code: " , errorCode , " Msg: " , errorString)

    def start(self):
        queryTime = ""
        
        # so everyone can get data use fx
        fx = Contract()
        fx.secType = "CASH" 
        fx.symbol = "USD"
        fx.currency = "JPY"
        fx.exchange = "IDEALPRO"
        
        # setting update to 1 minute still sends an update every tick? but timestamps are 1 min
        # I don't think keepUpToDate sends a realtimeBar every 5 secs, just updates the last bar.
        app.reqHistoricalData(1, fx, queryTime, "1 D", "1 min", "MIDPOINT", 0, 1, True, [])

wrap = MyWrapper()        
app = EClient(wrap)
app.connect("127.0.0.1", 7497, clientId=123)

#I just use this in jupyter so I can interact with df
import threading
threading.Thread(target = app.run).start()

#this isn't needed in jupyter, just run another cell
import time
time.sleep(300) # in 5 minutes check the df and close

print(wrap.df)
wrap.df.to_csv("myfile.csv")#save in file
app.disconnect()

#in jupyter to show plot
%matplotlib inline 
wrap.df.close.plot()

I use jupyter notebook so I added threading so I can still interact.

Here is some output. The first data received and printed comes from historicalDataEnd. A dataFrame gets made from the variables with a datetime index so bars can be added by time.

HistoricalDataEnd. ReqId: 1 from 20200707 14:23:19 to 20200708 14:23:19

Then later after 300 secs I print the dataframe. Check that ohlc is logical and notice a new bar every minute. The 14:28 bar is only the first 19 seconds I assume since my five minutes (300 secs) started at 14:23:19. This is exactly the behaviour you would want and expect for keeping a chart up to date.

2020-07-08 14:24:00  107.231  107.236  107.231  107.233     -1       -1   
2020-07-08 14:25:00  107.233  107.234   107.23  107.232     -1       -1   
2020-07-08 14:26:00  107.232  107.232  107.225  107.232     -1       -1   
2020-07-08 14:27:00  107.232  107.239  107.231  107.239     -1       -1   
2020-07-08 14:28:00  107.239  107.239  107.236  107.236     -1       -1

You can see that it gets all the bars (close only in graph) and keeps it up to date.

edited Jul 09 '20 at 17:28

answered Jul 08 '20 at 17:19

brian

10,619
4
21
79

" # setting update to 1 minute still sends an update every tick? but timestamps are 1 min # I don't think keepUpToDate sends a realtimeBar every 5 secs, just updates the last bar." What you are seeing is IB's iServer sending empty returns in between every tick. I talked with IB Chat regarding their API and I was told that they are working on fixing it. WRT your answer, I will try implementing the dataframe however I will have to edit the code such as remind def historicalDataEnd as I wish to have historicalDataUpdate enabled. – dinosaurslayer Jul 09 '20 at 15:41
I run it in Jupyter (python notebook). IB is wrong, that's not what I'm receiving. historicalDataUpdate IS enabled, it happens after historicalDatEnd. historicalDataEnd is where the dataframe gets made. There are other simpler ways but this is an example that shows all possibilities. I did this to test the functionality of historicalDataUpdate. I will add some output. – brian Jul 09 '20 at 17:05
I am looking to exclude historicalDataEnd from my code. Can I create the dataframe within historicalDataUpdate? – dinosaurslayer Jul 09 '20 at 17:27
1

You can make the dataFrame even before the request if you know the column names. What I've shown is the use case for `keepUpToDate=True`. It's for getting some history and keeping it up to date. If that's not your use case then maybe this is not the data request you need. – brian Jul 09 '20 at 17:34
I plan on using either Power BI or Jupyter Notebook but I have to restructure the dataframe first, which I am having trouble doing so – dinosaurslayer Jul 09 '20 at 18:02
Brian, I setup your code into Jupyter Notebook, and it works fine, however I am having trouble deciphering which value comes from what key. Also, in the excel file it saves to, the volume and average show up as -1 across all rows of data. – dinosaurslayer Jul 14 '20 at 15:50
`which value comes from what key` the file (and dataframe) should have a header row (it's csv which may just open with excel on your setup). The vol and avg are not part of fx data, change it to a stock. I just use fx since other people may not have your subscriptions. – brian Jul 14 '20 at 16:27
What defines the header row? Is it (bar)? Also, your code gives me this error ``` None --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 59 60 print(wrap.df) ---> 61 wrap.df.to_csv("myfile.csv")#save in file 62 app.disconnect() 63 AttributeError: 'NoneType' object has no attribute 'to_csv' ``` when I change the duration string from "1 D" to "7 D" or more. – dinosaurslayer Jul 15 '20 at 12:42
`vars(bar)` puts all the variables in the BarData type into a dict. Those are the column names. You should put some error checks to make sure the dataframe exists, for whatever reason no data was returned so the dataframe is `None`. Maybe try 1 W – brian Jul 15 '20 at 14:41
This is excellent thanks Brian - re. the nextValidId function, since you're calling self.start() in there I'm unable to generate the next orderId without calling reqHistoricalData again, is there any reason you included self.start under nextValidId or can it be moved elsewhere? – Jonesy Feb 23 '21 at 09:29
1

Read the docs, http://interactivebrokers.github.io/tws-api/connection.html#connect - Just says to wait for nextValidId before making requests. You don't need to generate an orderId, it's just a reminder of what the next one should be. They have to be increasing for every order but you can keep track of them however you want. – brian Feb 23 '21 at 16:12
Instead of updates I am getting, "open,high,low,close,volume,barCount,average". For example, my output is: 2021-06-18 16:16:00,3.2185,3.2185,3.2185,3.2185,-1,-1,-1.0 2021-06-18 16:17:00,3.2185,3.2185,3.2185,3.2185,-1,-1,-1.0 2021-06-18 16:18:00,open,high,low,close,volume,barCount,average 2021-06-18 16:19:00,open,high,low,close,volume,barCount,average 2021-06-18 16:20:00,open,high,low,close,volume,barCount,average – spitz Jun 18 '21 at 20:23
@spitz You should probably add a check to see if you are getting a valid bar. The market is probably closed. – brian Jun 18 '21 at 21:17
@brian Seems to be some dataframe key look up speed issue... The following works for me: d = pd.to_datetime(line.pop('date')) self.df.loc[d] = line.values() – spitz Jun 21 '21 at 10:56

score 0 · Answer 2 · answered Jul 08 '20 at 15:00

0

this is really ETL (extract, transform, load)
I can see each data element is of form Name:. Get all name tokens using this as a reg expr
with this list extract each token into a dict based on position of token and next token
get the data label before the first token
finally turn this into a pandas data frame

text= "HistoricalDataUpdate.  1 Date: 20200708  08:31:00 Open: 14.17 High: 14.17 Low: 14.17 Close: 14.17 Volume: -1 Average: 14.15 Count: -1"
tokens = re.findall("([A-Z][a-z]*:)", text)
json = {t:text[re.search(tokens[i], text).span(0)[1]:re.search(tokens[i+1], text).span(0)[0]] 
        if i+1<len(tokens) 
        else text[re.search(tokens[i], text).span(0)[1]:] 
        for i,t in enumerate(tokens)}
json = {"label":text[:re.search(tokens[0], text).span(0)[0]], **json}
df = pd.DataFrame([json])
df

output

    label   Date:   Open:   High:   Low:    Close:  Volume: Average:    Count:
0   HistoricalDataUpdate. 1 20200708 08:31:00   14.17   14.17   14.17   14.17   -1  14.15   -1

answered Jul 08 '20 at 15:00

Rob Raymond

29,118
3
14
30

` def historicalData(self, reqId, bar): print("HistoricalData. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount) ` is this your code? Why print something so difficult to parse? – Rob Raymond Jul 08 '20 at 15:07
Those are the names that I found to have worked to define each request. What would I put in the first line of example you provided labeled "text"? "HistoricalDataUpdate. 1 Date: 20200708 08:31:00 Open: 14.17 High: 14.17 Low: 14.17 Close: 14.17 Volume: -1 Average: 14.15 Count: -1" is just an example of the output that the code prints, so each value would be different depending on the received data. – dinosaurslayer Jul 08 '20 at 15:56
I would recommend keeping it in a data structure rather than sending it to stdout and trying to reverse engineer a `print()` statement. `bar` looks like a pretty decent data structure – Rob Raymond Jul 08 '20 at 16:05
How would I parse bar? I have changed my print statement to print("HistoricalData. ReqId:", reqId, "BarData:", bar) print("HistoricalDataUpdate. ReqId:", reqId, "BarData:", bar) – dinosaurslayer Jul 09 '20 at 15:51
2

really as other answer suggests. `print("HistoricalData. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount)` could instead create a dict and append it to a data frame. – Rob Raymond Jul 09 '20 at 16:03
Would I have to append the data from BarData or would it be from defining each key separately? – dinosaurslayer Jul 12 '20 at 17:12

Pandas DataFrame and Series - IB TWS HistoricalData

2 Answers2

Linked