How do I return a dataframe object from a thread

Question

I had previously asked this question by may not have been clear enough on my explanation of my particular situation. My previous question was voted as a duplicate of how to get the return value from a thread in python?

Perhaps I should have explained more. I had already read and tried the referenced thread, but nothing that I did from there seemed to work. (I could be just implementing it incorrectly).

My main class that does all the work and data transformation is:

class SolrPull(object):
    def __init__(self, **kwargs):
        self.var1 = kwargs['var1'] if 'var1' in kwargs else 'this'
        self.var2 = kwargs['var2'] if 'var2' in kwargs else 'that'

    def solr_main(self):
        #This is where the main data transformation takes place.
        return(self.flattened_df)

I need to create multiple objects and have them pull from a Solr database and transform data synchronously in different threads.

My arguments must be passed to the SolrPull class, not to the solr_main function.

I need to wait for those returns before continuing with processing.

I tried a couple of different answers from the referenced thread, but nothing worked.

Using the accepted answer for that thread, I did:

class TierPerf(object):
    def pull_current(self):

        pool = ThreadPool(processes=5)

        CustomerRecv_df_result = pool.apply_async(SolrPull(var1='this', var2='that').solr_main())
        APS_df_result = pool.apply_async(SolrPull(var1='this', var2='that').solr_main())

        self.CustomerRecv_df = CustomerRecv_df_result.get()
        self.APS_df = APS_df_result.get()

But the pulls and transformation do not happen synchronously. Then when I do the .get(), I get the error 'DataFrame object is not callable'.

As an end result, I need to be able to synchronously call SolrPull(*args).solr_main() and return pandas dataframe that will then be used for further processing.

GeorgeLPerkins · Accepted Answer · 2018-03-12T14:55:00.297

Well, after all the struggle and pain over that, I finally figured out my specifics after posting this question.

I went back to my original solution and then just set my desired dataframe (self.CustomerRecv_df) to the return dataframes attribute (CustomerRecv_df.flattened_df).

class TierPerf(object):
    def pull_current(self):        

        thread_list = []

        CustomerRecv_df = SolrPull(var1='this', var2='that')
        tr_CustomerRecv_df = threading.Thread(name='Customerrecev_tier', target=CustomerRecv_df.solr_main)
        thread_list.append(tr_CustomerRecv_df)

        APS_df = SolrPull(var1='this', var2='other')
        tr_APS_df = threading.Thread(name='APS_tier', target=APS_df.solr_main)
        thread_list.append(tr_APS_df)

        for thread in thread_list:
            print('Starting', thread)
            thread.start()

        for thread in thread_list:
            print('Joining', thread)
            thread.join()

        self.CustomerRecv_df = CustomerRecv_df.flattened_df
        self.APS_df = APS_df.flattened_df

How do I return a dataframe object from a thread

1 Answers1