2

Found a solution using .fillna

As you can guess, my title is already confusing, and so am I! I have a dataframe like this

Index         Values
 0             NaN
 1             NaN
...................
230            350.21
231            350.71
...................
1605           922.24

Between 230 and 1605 I have values, but not for the first 229 entries. So I calculated the slope to approximate the missing data and stored it in 'slope'.

Y1 = df['Values'].min()
X1ID = df['Values'].idxmin()
Y2 = df['Values'].max()
X2ID = df['Values'].idxmax()
slope = (Y2 - Y1)/(X2ID - X1ID)

In essence I want to get the .min from Values, subtract the slope and insert the new value in the index before the previous .min. However, I am completely lost, I tried something like this:

 df['Values2'] = df['Values'].min().apply(lambda x: x.min() - slope) 

But that is obviously rubbish. I would greatly appreciate some advise

EDIT

So after trying multiple ways I found a crude solution that at least works for me.

loopcounter = 0
missingValue = []
missingindex = []
missingindex.append(loopcounter)
missingValue.append(Y1)
for minValue in missingValue:
    minValue = minValue-slopeave
    missingValue.append(minwavelength)
    loopcounter +=1
    missingindex.append(loopcounter)
    if loopcounter == 230:
         break
del missingValue[0]
missingValue.reverse()
del missingindex[-1]

First I created two lists, one is for the missing values and the other for the index. Afterwards I added my minimum Value (Y1) to the list and started my loop. I wanted the loop to stop after 230 times (the amount of missing Values) Each loop would subtract the slope from the items in the list, starting with the minimum value while also adding the counter to the missingindex list.

Deleting the first value and reversing the order transformed the list into the correct order.

missValue = dict(zip(missingindex,missingValue))

I then combined the two lists into a dictionary

df['Values'] = df['Values'].fillna(missValue)

Afterwards I used the .fillna function to fill up my dataframe with the dictionary.

This worked for me, I know its probably not the most elegant solution...

I would like to thank everyone that invested their time in trying to help me, thanks a lot.

Clemens
  • 35
  • 1
  • 1
  • 5

3 Answers3

2

Check this. However, I feel you would have to put this is a loop, as the insertion and min calculation has to do the re-calculation

import pandas as pd
import numpy as np

df = pd.DataFrame(columns=('Values',),data=
                    [
                        np.nan,
                        np.nan,
                        350.21,
                        350.71,
                        922.24
                    ])

Y1 = df['Values'].min()
X1ID = df['Values'].idxmin()
Y2 = df['Values'].max()
X2ID = df['Values'].idxmax()
slope = (Y2 - Y1)/(X2ID - X1ID)

line = pd.DataFrame(data=[Y1-slope], columns=('Values',), index=[X1ID])
df2 = pd.concat([df.ix[:X1ID-1], line, df.ix[X1ID:]]).reset_index(drop=True)
print df2

The insert logic is provided here Is it possible to insert a row at an arbitrary position in a dataframe using pandas?

Community
  • 1
  • 1
pmaniyan
  • 1,046
  • 8
  • 15
1

I think you can use loc with interpolate:

print df
       Values
Index        
0         NaN
1         NaN
2         NaN
3         NaN
4         NaN
5         NaN
6         NaN
229       NaN
230    350.21
231    350.71
1605   922.24

#add value 0 to index = 0
df.at[0, 'Values'] = 0
#add value Y1 - slope (349.793978) to max NaN value 
df.at[X1ID-1, 'Values'] = Y1 - slope
print df
           Values
Index            
0        0.000000
1             NaN
2             NaN
3             NaN
4             NaN
5             NaN
6             NaN
229    349.793978
230    350.210000
231    350.710000
1605   922.240000
print df.loc[0:X1ID-1, 'Values']
Index
0        0.000000
1             NaN
2             NaN
3             NaN
4             NaN
5             NaN
6             NaN
229    349.793978
Name: Values, dtype: float64

#filter values by indexes and interpolate
df.loc[0:X1ID-1, 'Values'] = df.loc[0:X1ID-1, 'Values'].interpolate(method='linear')
print df
           Values
Index            
0        0.000000
1       49.970568
2       99.941137
3      149.911705
4      199.882273
5      249.852842
6      299.823410
229    349.793978
230    350.210000
231    350.710000
1605   922.240000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This seems to technically work but it somehow are not expected values, it seems to decrease very fast. Sadly the slope can only be calculated over longer ranges because its more like stairs and not strictly linear. – Clemens Apr 27 '16 at 12:16
  • 1
    Maybe try change `interpolate` method. – jezrael Apr 27 '16 at 12:17
1

I will revise this a little bit:

df['Values2'] = df['Values']
df.ix[df.Values2.isnull(), 'Values2'] = (Y1 - slope)

EDIT

Or try to put this in a loop like below. This will recursively fill in all values until it reaches the end of the series:

def fix_rec(series):
    Y1 = series.min()
    X1ID = series.idxmin() ##; print(X1ID)
    Y2 = series.max()
    X2ID = series.idxmax()
    slope = (Y2 - Y1) / (X2ID - X1ID);

    if X1ID == 0: ## termination condition
        return series

    series.loc[X1ID-1] = Y1 - slope

    return fix_rec(series)

call it like this:

df['values2'] = df['values']
fix_rec(df.values2)

I hope that helps!

Thanos
  • 2,472
  • 1
  • 16
  • 33
  • Thanks for your suggestion. Now we get the same value for all 229 entries but they should decrease further and further the closer we get to 0. Maybe we need to loop because technically the .min of values (y1) should always be a new one after each subtraction. – Clemens Apr 27 '16 at 11:44