0

I have a numpy array with these values: [10620.5, 11899., 11879.5, 13017., 11610.5]

import Numpy as np
array = np.array([10620.5, 11899,  11879.5, 13017,  11610.5])

I would like to get values that are "close" (in this instance, 11899 and 11879) and average them, then replace them with a single instance of the new number resulting in this:

[10620.5, 11889, 13017, 11610.5]

the term "close" would be configurable. let's say a difference of 50

the purpose of this is to create Spans on a Bokah graph, and some lines are just too close

I am super new to python in general (a couple weeks of intense dev)

I would think that I could arrange the values in order, and somehow grab the one to the left, and right, and do some math on them, replacing a match with the average value. but at the moment, I just dont have any idea yet.

  • Please see [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and include some information about input, desired output, and what you've tried so far (if anything). For example, how are you defining "close" for these purposes? – G. Anderson Jul 12 '19 at 16:17

3 Answers3

1

Try something like this, I added a few extra steps, just to show the flow: the idea is to group the data into adjacent groups, and decide if you want to group them or not based on how spread they are.

So as you describe you can combine you data in sets of 3 nummbers and if the difference between the max and min numbers are less than 50 you average them, otherwise you leave them as is.

enter image description here

import pandas as pd
import numpy as np
arr = np.ravel([1,24,5.3, 12, 8, 45, 14, 18, 33, 15, 19, 22])
arr.sort()

def reshape_arr(a, n): # n is number of consecutive adjacent items you want to compare for averaging
    hold = len(a)%n
    if hold != 0:
        container = a[-hold:] #numbers that do not fit on the array will be excluded for averaging
        a = a[:-hold].reshape(-1,n)
    else:
        a = a.reshape(-1,n)
        container = None
    return a, container
def get_mean(a, close): # close = how close adjacent numbers need to be, in order to be averaged together
    my_list=[]
    for i in range(len(a)):
        if a[i].max()-a[i].min() > close:
            for j in range(len(a[i])):
                my_list.append(a[i][j])
        else:
            my_list.append(a[i].mean())
    return my_list  
def final_list(a, c): # add any elemts held in the container to the final list
    if c is not None:
        c = c.tolist()
        for i in range(len(c)):
            a.append(c[i])
    return a 

arr, container = reshape_arr(arr,3)
arr = get_mean(arr, 5)
final_list(arr, container)
  • I am currently trying to implement this, I had other issues break – Keven Scharaswak Jul 19 '19 at 21:51
  • when implementing this, I had to make a change. Apparently I wasn't using a Numpy Array, but a Pandas Series. so within the reshape_arr, I had to change `a.reshape(-1,n)` to `a.values.reshape(-1,n)`. I, then, get an error on the return saying `Length of values does not match length of index` Im still really new to python, can you see what is going on? – Keven Scharaswak Jul 30 '19 at 16:30
0

You could use fuzzywuzzy here to gauge the ratio of cloesness between 2 data sets.

See details here: http://jonathansoma.com/lede/algorithms-2017/classes/fuzziness-matplotlib/fuzzing-matching-in-pandas-with-fuzzywuzzy/

alanindublin
  • 101
  • 6
0

Taking Gustavo's answer and tweaking it to my needs:

def reshape_arr(a, close):
    flag = True
    while flag is not False:
        array = a.sort_values().unique()
        l = len(array)
        flag = False
        for i in range(l):
            previous_item = next_item = None
            if i > 0:
                previous_item = array[i - 1]
            if i < (l - 1):
                next_item = array[i + 1]
            if previous_item is not None:
                if abs(array[i] - previous_item) < close:
                    average = (array[i] + previous_item) / 2
                    flag = True
                    #find matching values in a, and replace with the average
                    a.replace(previous_item, value=average, inplace=True)
                    a.replace(array[i], value=average, inplace=True)

            if next_item is not None:
                if abs(next_item - array[i]) < close:
                    flag = True
                    average = (array[i] + next_item) / 2
                    # find matching values in a, and replace with the average
                    a.replace(array[i], value=average, inplace=True)
                    a.replace(next_item, value=average, inplace=True)
    return a

this will do it if I do something like this:

 candlesticks['support'] = reshape_arr(supres_df['support'], 150)

where candlesticks is the main DataFrame that I am using and supres_df is another DataFrame that I am massaging before I apply it to the main one.

it works, but is extremely slow. I am trying to optimize it now.

I added a while loop because after averaging, the averages can become close enough to average out again, so I will loop again, until it doesn't need to average anymore. This is total newbie work, so if you see something silly, please comment.