0

I'm want to create a dataframe, out of arrays with different size. I want to fill the missing values depending on similar values.

I've tried to stick the arrays together and do a sort and a split with numpy. I've then calculate the mean of the splits and decide wether its a value close to the mean or its better fill with nan.

def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return idx

#generate sample data
loa = [((np.arange(np.random.randint(1,3),np.random.randint(3,6)))*val).tolist() 
            for val in np.random.uniform(0.9,1.1,5)]

#reshape
flat_list = sum(loa,[])

#add some attributes
attributes = [np.random.randint(-3,-1) for x in range(len(flat_list))]

#sort and split on percentage change
flat_list.sort()
arr = np.array(flat_list)
arr_splits = np.split(arr, np.argwhere(np.diff(arr)/arr[1:]*100 > 12)[:,0])

#means of the splits
means = [np.mean(arr) for arr in arr_splits]

#create dataframe
i = 0
res = np.zeros((len(loa), len(means)*2))*np.nan
for row, l in enumerate(loa):
    for val in l:
        col = find_nearest(means, val)
        res[row, col] = val
        res[row, col+len(means)] = attributes[i]
        i = i + 1

df = pd.DataFrame(res)

Is there another way, to do this stuff more directly with pandas? ... or something more elegant?

incognito
  • 199
  • 3
  • 16
  • This might be useful https://stackoverflow.com/questions/22491628/extrapolate-values-in-pandas-dataframe – Unni Jun 11 '19 at 22:17
  • Unfortunately, I do not see how that might help. I dont want to extra or interpolate anthying... I just want to sort the right values ​​together and fill the rest with nan – incognito Jun 12 '19 at 07:45

0 Answers0