1

A little bit of context: I am writing a stock back test program which subscribes to a websocket feed that streams data for over 100 stocks 1 minute candle. Since this is a back test program, it actually get fed by my local data.

I am trying to utilise Ray for multiprocessing.

So it will receive stock data in a loop locally, for example, A, B, C .... A, B, C... continuously until end of time period. Currently my set up with Ray is call handleStock.remote() asynchronously for each of the stock data. For performance reason, each time it process same stock but different time, there should be some cache to help process faster. But due to each of the stock per time is running as a completely new process/worker in Ray, I can't have a place to share memory and also write to it. I tried Ray put method but it's read only.

Is there a way to resolve this or a different tool to run this back test?

Tony Lin
  • 922
  • 6
  • 25
  • I'm not sure about Ray but you can certainly write to multiprocess shared_memory if you use numpy arrays. You just need to lock the object while the writing is done and then release it when you are finished. @Rboreal_Frippery 's answer [here](https://stackoverflow.com/a/59257364/1178971) is a good example of how to do this. – forgetso Aug 07 '20 at 08:38

1 Answers1

2

Ray object store values are meant to be first-writer-wins. One potential option here is to store the data in an actor instead of in the object store.

If you really need to return the object and performance is important, perhaps break the operation up into 2 async operations.

  1. Perform calculation
  2. Return result
Alex
  • 1,388
  • 1
  • 10
  • 19
  • I couldn't find a way for the workers even to write to the store value, how do you do that? The break up to 2 async operations does not work for me because each worker has complex logic in them and they need to access the shared memory at different time perhaps. – Tony Lin Aug 11 '20 at 06:28
  • Object store values are immutable (thus storing the state in an actor instead). I'm not sure what you mean by you can't break it up into 2 steps. For example, you can always create an actor which is a thin wrapper around your true object. Instead of calling `ray.put`, store the result in your actor instead. Step 2 becomes just `ray.put` – Alex Aug 11 '20 at 07:06