Weighting Data Using Numpy

Question

My data looks like:

list=[44359, 16610,  8364, ...,     1,     1,     1]

For each element in list I want to take i*([i+1]+[i-1])/2, where i is an element in the list, and i+1 and i-1 are the adjacent elements.

For some reason I cannot seem to do this cleanly in NumPy.

Here's what I've tried:

weights=[]
weights.append(1)
for i in range(len(hoff[3])-1):
    weights.append((hoff[3][i-1]+hoff[3][i+1])/2)

Where I append 1 to the weights list so that lengths will match at the end. I arbitrarily picked 1, I'm not sure how to deal with the leftmost and rightmost points either.

And how do you want to deal with the leftmost and rightmost points? — Divakar, Jan 03 '18 at 16:37

Paul H · Answer 1 · 2018-01-03T18:31:58.897

I would use pandas for this, filling in the missing left- and right-most values with 1 (but you can use any value you want):

import numpy
import pandas

numpy.random.seed(0)
data = numpy.random.randint(0, 10, size=15)

df = (
    pandas.DataFrame({'hoff': data})
        .assign(before=lambda df: df['hoff'].shift(1).fillna(1).astype(int))
        .assign(after=lambda df: df['hoff'].shift(-1).fillna(1).astype(int))
        .assign(weight=lambda df: df['hoff'] * df[['before', 'after']].mean(axis=1))
)
print(df.to_string(index=False)

And that gives me:

hoff  before  after  weight
   5       1      0     2.5
   0       5      3     0.0
   3       0      3     4.5
   3       3      7    15.0
   7       3      9    42.0
   9       7      3    45.0
   3       9      5    21.0
   5       3      2    12.5
   2       5      4     9.0
   4       2      7    18.0
   7       4      6    35.0
   6       7      8    45.0
   8       6      8    56.0
   8       8      1    36.0
   1       8      1     4.5

A pure numpy-based solution would look like this (again, filling with 1):

before_after = numpy.ones((data.shape[0], 2))
before_after[1:, 0] = data[:-1]
before_after[:-1, 1] = data[1:]
weights = data * before_after.mean(axis=1)
print(weights)

array([  2.5,   0. ,   4.5,  15. ,  42. ,  45. ,  21. ,  12.5,   9. ,
        18. ,  35. ,  45. ,  56. ,  36. ,   4.5])

score 0 · Accepted Answer · answered Jan 03 '18 at 17:18

0

You can use numpy's array operations to represent your "loop". If you think of data as bellow, where pL and pR are the values you choose to "pad" your data with on the left and right:

[pL, 0, 1, 2, ..., N-2, N-1, pR]

What you're trying to do is this:

[0, ..., N - 1] * ([pL, 0, ..., N-2] + [1, ..., N -1, pR]) / 2

Written in code it looks something like this:

import numpy as np
data = np.random.random(10)

padded = np.concatenate(([data[0]], data, [data[-1]]))
data * (padded[:-2] + padded[2:]) / 2.

Repeating the first and last value is known as "extending" in image processing, but there are other edge handling methods you could try.

answered Jan 03 '18 at 17:18

Bi Rico

25,283
3
52
75

1

Good thinking on the padding. You could do `padded = numpy.pad(data, 1, 'edge')` in place of your concatenation – Paul H Jan 03 '18 at 18:24
Fantastic, I ended up using a slight variant of what you've given here. Also appreciate the edge case handling discussion – qdspec Jan 03 '18 at 21:48

Weighting Data Using Numpy

2 Answers2