3

I have a data-frame consisting of Time Series.

Date Index | Time Series 1 | Time Series 2 | ... and so on

I have used pyRserve to run a forecasting function using R.

I want to implement parallel processing using celery. I have written the worker code in the following context.

def pipeR(k #input variable):
    conn = pyRserve.connect(host = 'localhost', port = 6311)
    # OPENING THE CONNECTION TO R

    conn.r.i = k
    # ASSIGNING THE PYTHON VARIABLE TO THAT OF IN THE R ENVIRONMENT

    conn.voideval\('''
    WKR_Func <- forecst(a)
    {
    ...# FORECASTS THE TIMESERIES IN COLUMN a OF THE DATAFRAME
    }
    ''')

    conn.eval('forecst(i)')
    # CALLING THE FUNCTION IN R

group(pipeR.s(k) for k in [...list of column headers...])()

To implement parallel processing, can I have a single port for all the worker process (like I did in the above code, port:6311) or should I have different ports for different worker processes ??

I'm currently getting an error

Error in socketConnection("localhost", port=port, server=TRUE, blocking=TRUE, : cannot open the connection

in R.

scytale
  • 12,346
  • 3
  • 32
  • 46

1 Answers1

1

The problem got resolved when I opened different ports for each worker process...

def pipeR( k, Frequency, Horizon, Split, wd_path):
    # GENERATING A RANDOM PORT
    port = randint(1000,9999)

    # OPENING THE PORT IN THE R ENVIRONMENT
    conn0 = pyRserve.connect(host = 'localhost', port = 6311)
    conn0.r.port = port
    conn0.voidEval\
    ('''
        library(Rserve)
        Rserve(port = port, args = '--no-save')
     ''')

    # OPENING THE PORT IN THE PYTHON ENVIRONMENT
    conn = pyRserve.connect(host = 'localhost', port = port)

    # ASSIGNING THE PYTHON VARIABLE TO THAT OF IN THE R ENVIRONMENT
    conn.r.i = k

    conn.voideval\
    ('''
     WKR_Func <- forecst(a)
     {
     ...# FORECASTS THE TIMESERIES IN COLUMN a OF THE DATAFRAME
     }
     ''')

    conn.eval/('forecst(i)')
    conn0.close()
  • I'm looking at your answer here and am planning on doing something similar; is it the case that if you have multiple connections to `conn0` that they all share the same same state in R? I see on the [Rserve docs](http://www.rforge.net/Rserve/) that "Every connection has a separate workspace and working directory."? – shapiromatron Jan 03 '18 at 22:34
  • I think so. The docs say _every thread should must have its own connection object unless synchronized explicitly_. Hence, I tried the above approach. – blitZ_91011 Jan 30 '18 at 10:20
  • I just checked; don't think that's true. Here I'm running 1 Rserve instance and I have two connections in python; state isn't shared https://gist.github.com/shapiromatron/80a2547a599441332c736ff5d836097b – shapiromatron Feb 05 '18 at 17:31