0

I was using apply_parallel function from pandarallel library, the below snippet(Function call) iterates over rows and fetches data from mongo db. While executing the same throws me EOFError and a mongo client warning as given below

Mongo function:

def fetch_final_price(model_name, time, col_name):
    collection = database['col_name']
    price = collection.find({"$and":[{"Model":model_name},{'time':time}]})
    price = price[0]['price']
    return price

Function call:

final_df['Price'] = df1.parallel_apply(lambda x :fetch_final_price(x['model_name'],x['purchase_date'],collection_name), axis=1)

MongoClient config:

client = pymongo.MongoClient(host=host,username=username,port=port,password=password,tlsCAFile=sslCAFile,retryWrites=False)

Error:

EOFError: Ran out of input

Mongo client warning:

"MongoClient opened before fork. Create MongoClient only "

How to make db calls in parallel_apply??

GeekGroot
  • 102
  • 6

1 Answers1

0

First of all, "MongoClient opened before fork" warning also provides a link for the documentation, from which you can know that in multiprocessing (which pandarallel base on) you should create MongoClient inside your function (fetch_final_price), otherwise it likely leads to a deadlock:

def fetch_final_price(model_name, time, col_name):
    client = pymongo.MongoClient(
        host=host,
        username=username,
        port=port,
        password=password,
        tlsCAFile=sslCAFile,
        retryWrites=False
    )
    collection = database['col_name']
    price = collection.find({"$and": [{"Model": model_name}, {'time': time}]})
    price = price[0]['price']
    return price

The second mistake, that leads to the exception in the function and the following EOFError, is that you use the brackets operator to a find result, which is actually an iterator, not a list. Consider using find_one if you need only a first instance (alternatively, you can do next(price) instead of indexing operator, but it's not a good way to do this)

def fetch_final_price(model_name, time, col_name):
    client = pymongo.MongoClient(
        host=host,
        username=username,
        port=port,
        password=password,
        tlsCAFile=sslCAFile,
        retryWrites=False
    )
    collection = database['col_name']
    price = collection.find_one({"$and": [{"Model": model_name}, {'time': time}]})
    price = price['price']
    return price
Andrii
  • 70
  • 6