0

I am trying to regularize an uneven time series with Pandas as in this example https://stackoverflow.com/a/39730730/10005441.

However, my process gets killed with exit code 137 ("(interrupted by signal 9: SIGKILL)") when I try to do so on my own dataset. From what I've read online so far, it seems like the error is caused by a memory leak. This is confirmed by the fact that the process seems to take over all the available application memory.

But since I am not defining any variables that I could later dereference to free up space (I only call built-in methods in my code), I don't know how I can fix the memory leak.

I thought at first that my dataset might be too large for this process (about 1500 rows), but I kept restricting to fewer and fewer rows until I got down to just 70, and it still seems to cause the same problem – only now the process does not interrupt by itself, it freezes and I need to kill it manually.


import io
import pandas as pd

data = io.StringIO('''\
    Values
    1900-2-18 00:00:00,  2.4061398208006928\n1900-4-6 00:00:00,  4.190919536676638
    1900-5-20 00:00:00,  7.154316563897394\n1900-6-28 00:00:00,  8.511064948122844
    1900-7-29 00:00:00,  12.948325882041525\n1900-9-3 00:00:00,  14.874496695573287
    1900-10-18 00:00:00,  11.275824547647606\n1900-12-13 00:00:00,  3.7065864698234683
    1901-3-3 00:00:00,  7.2656643017780995\n1901-8-11 00:00:00,  3.132476380916307
    1901-12-31 00:00:00,  3.255504055908594\n1902-3-6 00:00:00,  2.558366292009146
    1902-5-11 00:00:00,  4.3928567952933095\n1902-7-16 00:00:00,  5.697896757357601
    1902-10-2 00:00:00,  7.002936719421891\n1902-12-9 00:00:00,  5.736406393587798
    1903-1-20 00:00:00,  9.328924220179179\n1903-2-27 00:00:00,  8.587849660274507
    1903-4-15 00:00:00,  7.392418135961156\n1903-6-3 00:00:00,  9.917996320293712
    1903-7-15 00:00:00,  6.590267808814529\n1903-8-26 00:00:00,  3.2153378869541758
    1903-10-7 00:00:00,  2.996751882107189\n1903-11-15 00:00:00,  2.712339561397424
    1903-12-20 00:00:00,  1.319131420500554\n1904-1-19 00:00:00,  0.8865938043571631
    1904-2-14 00:00:00,  1.9964471435094566\n1904-3-9 00:00:00,  3.083502456213582
    1904-4-4 00:00:00,  4.170557768917708\n1904-4-28 00:00:00,  3.831315100660575
    1904-5-23 00:00:00,  2.309160171614012\n1904-6-19 00:00:00,  3.694378817767278
    1904-7-14 00:00:00,  5.6004781490273725\n1904-8-2 00:00:00,  4.816350292831508
    1904-9-1 00:00:00,  5.603998055036929\n1904-10-5 00:00:00,  3.631258067715407
    1904-11-2 00:00:00,  2.649097889985815\n1904-12-2 00:00:00,  1.608099020817645
    1905-1-11 00:00:00,  1.2811050146707985\n1905-3-8 00:00:00,  1.1295258409634243
    1905-5-15 00:00:00,  5.369997012480915\n1905-7-18 00:00:00,  6.8334105299517365
    1905-9-14 00:00:00,  9.888561079276236\n1905-11-21 00:00:00,  9.820776125433214
    1906-2-17 00:00:00,  8.583688873547414\n1906-5-27 00:00:00,  5.669451125498982
    1906-9-22 00:00:00,  7.403538545288166\n1907-2-19 00:00:00,  5.589027737652207
    1907-6-2 00:00:00,  4.904431053393889\n1907-7-21 00:00:00,  5.383923257816266
    1907-9-8 00:00:00,  2.5896575192353657\n1907-10-27 00:00:00,  1.5265738784902498
    1907-12-16 00:00:00,  1.4187730996080212\n1908-2-5 00:00:00,  1.6819846618479541
    1908-4-4 00:00:00,  3.25556162055965\n1908-6-26 00:00:00,  6.196662536751723
    1908-10-2 00:00:00,  6.518587879075245\n1908-12-24 00:00:00,  4.02972145733511
    1909-2-22 00:00:00,  4.388033210271457\n1909-4-14 00:00:00,  6.725186739916857
    1909-6-9 00:00:00,  9.675973353608608\n1909-8-10 00:00:00,  6.518904258510972
    1909-10-27 00:00:00,  3.533225003441829\n1910-1-6 00:00:00,  1.4671928484167065
    1910-2-26 00:00:00,  1.1987894438707483\n1910-4-17 00:00:00,  4.4867671003426945
    1910-6-8 00:00:00,  6.4285613573922795\n1910-8-6 00:00:00,  5.553929676414903
    1910-10-7 00:00:00,  9.853959878865188\n1910-12-2 00:00:00,  2.982478843773414
    ''')

s = pd.read_csv(data, squeeze=True)
s.index = pd.to_datetime(s.index)

res = s.resample('s').interpolate().resample('1AS').asfreq().dropna()
print(res)
  • Run step by step in an `ipython` session, it's the `interpolate()` step that produces a `MemoryError`. – hpaulj Jun 16 '19 at 03:00
  • `s.resample('s')` produces a resampler object with a `freq=`. If instead I use `s.resample('D')` the `freq` is days; and its `count()` method produces a 3940 length. Se we get a memory error with 's' simply because there are too many seconds in the 10 year period of your data. – hpaulj Jun 16 '19 at 03:19

0 Answers0