I have a piece of code which calculates positions of some satellites and planets using Skyfield. For clarity, I use Pandas DataFrame as a container of positions and corresponding time moments. I want to make calculation parallel, but always getting the same error: TypeError: can't pickle Satrec objects
. Different parallelizers were tested, like Dask, pandarallel, swifter and Pool.map().
Example of piece of code to be parallelized:
def get_sun_position(self, row):
t = self.ts.utc(row["Date"]) # from skyfield
pos = self.earth.at(t).observe(self.sun).apparent().position.m # from skyfield, error is here
return pos
def get_sat_position(self, row):
t = self.ts.utc(row["Date"]) # from skyfield
pos = self.sat.at(t).position.m # from skyfield, error is here
return pos
def get_positions(self):
self.df["sat_pos"] = self.df.swifter.apply(self.get_sat_position, axis=1) # all the parallelization goes here
self.df["sun_pos"] = self.df.swifter.apply(self.get_sun_position, axis=1) # and here
# the same implementation but using dask
# self.df["sat_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
# .map_partitions(lambda df : df.apply(lambda row : self.get_sat_position(row),axis=1))\
# .compute(scheduler='processes')
# self.df["sun_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
# .map_partitions(lambda df : df.apply(lambda row : self.get_sun_position(row),axis=1))\
# .compute(scheduler='processes')
For Dask to avoid Pickle I tried to set serializaton manually like this serializers=['dask', 'pickle']
but it didn't help.
As I understand, Skyfield uses sgp4 which contains Satrec class.
I would be wondering if there is some way to parallelize this .apply()
. Or maybe I should not try Skyfield functions for parallel processing at all?