3

Consider a series of data with a known coordinate (in this case, paleoclimate data with ages in thousands of years before present, or "ka"). For many reasons, the time coordinate for these data are never evenly spaced. But for most analyses, its critical to compare data on the same time coordinate.

What'd I'd love is a simple code that takes unevenly spaced data and linearly interpolates them to an even spacing, with the spacing interval defined by the user. Mathematically there are at least two ways of doing this:

  1. Take the rate of change between two points and using that rate to map values at intermediate points;
  2. Do a distance-weighted average, with the closer time point more heavily weighted. You should get the same answer either way.

Columns A through C are paleoclimate data with uneven spacing. Columns E through G are those same data, now evenly spaced to every 5 ka. I want to take the data in columns A through C and get the correct interpolation in Columns E through G subject to a ka parameter I set.

Once that basic code is in place, it'd be nice to add a few bells and whistles. An extrapolation function for time points outside the domain would be really helpful. For example, I have an interpolated value for 400 ka, even though I do not have data from times straddling 400 ka.

I have started with pandas for organizing the data and then another SO post pointed me towards traces. I am still working on it but would appreciate any insight.

A (ka)     B       C
401.3      3.49    0.34
403.2      3.95    0.25
407.2      3.74    1.13
409.2      3.71    1.03
411.2      3.73    1.05
413.1      3.58    -0.08
415.1      4.4     0.46

ka = 5

E (ka)     F       G
400        3.18    0.40
405        3.86    0.65
410        3.72    1.04
415        4.36    0.43

1 Answers1

3

included functions and handling of extrapolation

def get_line(s):
    x0 = s.first_valid_index()
    p0 = s.index.get_loc(x0)
    p1 = p0 + 1
    x1 = s.index[p1]
    y0, y1 = s.at[x0], s.at[x1]
    m = (y1 - y0) / (x1 - x0)
    f = lambda x: (x - x0) * m + y0
    return s.index[s.isnull()].to_series().map(f)

def interpolate(df, nidx):
    ridx = df.index.union(nidx)
    d = df.reindex(ridx).interpolate('index')
    return d.fillna(d.apply(get_line)).loc[nidx]

print(interpolate(df.set_index('A (ka)'), [400, 405, 410, 420]).round(2))

        B     C
400  3.18  0.40
405  3.86  0.65
410  3.72  1.04
420  4.40  0.46

answer for interpolation

Finding a calculation at ka 400 is not interpolation... that's extrapolation. At ka 405, interpolation takes the two points immediately around it and... well... interpolates :-)

plan

  • set your index to 'A (ka)'
  • create a sub index for the points you care about
  • reindex with the union of the old index and sub index. NaN will be placed in new spots
  • interpolate to fill in NaN. Make sure to use method='index' to correctly calculate relative to your index
  • slice out just your sub index

df = df.set_index('A (ka)')
nidx = pd.RangeIndex(400, 420, 5)
ridx = df.index.union(nidx)
df.reindex(ridx).interpolate('index').reindex(nidx)

          B      C
400     NaN    NaN
405  3.8555  0.646
410  3.7180  1.038
415  4.3590  0.433

Note at index 400, we still have NaN.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • thanks for this. Do you have a solution on the extrapolation (which I accurately described in the question)? – thaneofcawdor Apr 07 '17 at 19:09
  • @thaneofcawdor I don't have a solution with handy built-in's. I'd have to force it a bit. I'll only post it if I think its pretty. Otherwise, I'm sure you know the math of it. :-) – piRSquared Apr 07 '17 at 21:11
  • @thaneofcawdor also, if this was helpful, feel free to up vote. – piRSquared Apr 07 '17 at 21:12
  • I tried using this http://stackoverflow.com/questions/22491628/extrapolate-values-in-pandas-dataframe but it is not right - I only want to extrapolate based on the preceeding two datapoints, not the entire dataset – thaneofcawdor Apr 07 '17 at 21:19