0

I'd like to score very large files using models built in R.

The idea is to extract the actual predictor equation from the R model object and define a python string containing the equation.

The predictor header of the large predictor file has the same predictor names as those used to build the model (model development and model scoring predictors were generated using the same python code).

I'd like to score the large predictor file with python (thereby avoiding the need to split/chunk the predictor file to allow R processing, even if R's predict function is really an attractive alternative).

So I've checked How do I execute a string containing Python code in Python? and other posts. Since eval and exec are frowned upon in the python community, I am wondering what's the most pythonic way to dynamically apply an equation to a set of predictors stored in a csv file. Thanks.

import csv
import StringIO

predfile = StringIO.StringIO(
'''x1,x2
1,2
3,4''')

eq = '1 + 2*x1 + 3*x2'
reader = csv.reader( predfile , delimiter=',' )
header = reader.next()
for row in reader:
    exec("{0}={1}".format(header[0],row[0]))
    exec("{0}={1}".format(header[1],row[1]))
    exec("yhat={0}".format(eq))
    print yhat
Community
  • 1
  • 1
user2105469
  • 1,413
  • 3
  • 20
  • 37
  • 1
    Does your formula need to be in the Python code as a string? Can you write it as a function instead, taking keyword arguments, perhaps? That would lend itself to a much more natural solution using a `csv.DictReader`: `func(**row_dict)` – Blckknght May 22 '13 at 22:43
  • Not necessarily: putting the formula into a string is the simplest option I thought of (the result of running `paste` on variable names and coefficients within `R`). I'll look up csv.DictReader, thanks. – user2105469 May 22 '13 at 22:52

1 Answers1

1

To expand on my comment, here's a possible solution for you that turns your equation into a function that takes arguments named after your column headers, then feeds in the rows from a DictReader:

import csv
import StringIO

predfile = StringIO.StringIO(
'''x1,x2
1,2
3,4''')

def func(x1, x2):
    x1 = int(x1)
    x2 = int(x2)
    return 1 + 2*x1 + 3*x2

reader = csv.DictReader( predfile , delimiter=',' ) # header is handled automatically
for row in reader:
    print func(**row)
Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • @user2105469: Whoops, yeah, that was an artifact of a previous iteration of the code. I was playing around with ways to make the function handy to print out if you wanted to (with its name being the name of your final value and its docstring being the formula) but I decided that was overkill for a simple example. I'm glad this helped! – Blckknght May 23 '13 at 00:18
  • It sure did, this is going straight into my scoring code. Much appreciated. – user2105469 May 23 '13 at 00:25