0

I have a piece of code which calculates positions of some satellites and planets using Skyfield. For clarity, I use Pandas DataFrame as a container of positions and corresponding time moments. I want to make calculation parallel, but always getting the same error: TypeError: can't pickle Satrec objects. Different parallelizers were tested, like Dask, pandarallel, swifter and Pool.map().

Example of piece of code to be parallelized:

        def get_sun_position(self, row):
            t = self.ts.utc(row["Date"]) # from skyfield
            pos = self.earth.at(t).observe(self.sun).apparent().position.m # from skyfield, error is here
            return pos

        def get_sat_position(self, row):
            t = self.ts.utc(row["Date"]) # from skyfield
            pos = self.sat.at(t).position.m # from skyfield, error is here
            return pos

        def get_positions(self):
            self.df["sat_pos"] = self.df.swifter.apply(self.get_sat_position, axis=1) # all the parallelization goes here
            self.df["sun_pos"] = self.df.swifter.apply(self.get_sun_position, axis=1) # and here

# the same implementation but using dask
#         self.df["sat_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
#             .map_partitions(lambda df : df.apply(lambda row : self.get_sat_position(row),axis=1))\
#                 .compute(scheduler='processes')
#         self.df["sun_pos"] = dd.from_pandas(self.df, npartitions=4*cpu_count())\
#             .map_partitions(lambda df : df.apply(lambda row : self.get_sun_position(row),axis=1))\
#                 .compute(scheduler='processes')

For Dask to avoid Pickle I tried to set serializaton manually like this serializers=['dask', 'pickle'] but it didn't help.

As I understand, Skyfield uses sgp4 which contains Satrec class.

I would be wondering if there is some way to parallelize this .apply(). Or maybe I should not try Skyfield functions for parallel processing at all?

lazySeal
  • 1
  • 1
  • As I see there is no need for exchange of information between threads at all. So this task should be quite simple, right? – lazySeal Apr 01 '20 at 15:27

1 Answers1

0

Alas, all of the mechanisms you are using to make the computation parallel do so by creating another process and then sending copies of all of the objects involved in the computation over to the other process — and the Satrec object is written in C++, not Python, to make it faster, and C++ objects have no native way to "serialize" themselves into bytes for transmission to another process. (Python objects have that ability built-in.)

Have you profiled your code to see what the most expensive steps are? My guess is that most of your expense is in the Sun computation, because to achieve its high precision Skyfield needs to compute the Earth's orientation to very high accuracy to give the Sun's position in the sky to high enough precision for even radio astronomers.

But if you yourself don't need that high an accuracy, you could switch to lower-precision sky coordinates for the Sun. Before using t in get_sun_position(), try doing this to it:

t._nutation_angles = iau2000b(t.tt)

That will use a lower precision estimate of the Earth's nutation (print out the values before and after this change to see how big the difference is, and compare that to how much inaccuracy your application can stand), but also hopefully run faster.

Brandon Rhodes
  • 83,755
  • 16
  • 106
  • 147
  • You are right, the most expensive operation is the Sun position computation. Unfortunately, using of `t._nutation_angles = iau2000b(t.tt)` before `t` didn't speed up the calculation. But I caught the idea and will try to reduce the accuracy of determining the coordinates of the Earth in `get_sun_position()`. Could you tell me if there is an opportunity to run `get_sun_position()` and `get_sat_position()` on a different cores? – lazySeal Apr 03 '20 at 10:12
  • If you are passing an array of times, then many of the underlying operations happen in NumPy rather than Python. Have you explored whether the version of NumPy you are using on your platform uses multiple cores for its array operations? Oh, and: the nutation-angles maneuver should speed things up considerably, I wonder why it isn't working for you? If you could provide a full Python script in the question that folks could run to reproduce your problem, then I'd be happy to try it out on my own machine and see why that speedup isn't helping. – Brandon Rhodes Apr 09 '20 at 19:54
  • Of course, [here](https://github.com/newtonee/Visibility-of-the-Sun) you can find jupyter notebook with the work discussed. I'm using NumPy 1.16.4. – lazySeal Apr 15 '20 at 12:46