After some help in the forum I managed to do what I was looking for and now I need to get to the next level. ( the long explanation is here: Python Data Frame: cumulative sum of column until condition is reached and return the index):
I have a data frame:
In [3]: df
Out[3]:
index Num_Albums Num_authors
0 0 10 4
1 1 1 5
2 2 4 4
3 3 7 1000
4 4 1 44
5 5 3 8
I add a column with the cumulative sum of another column.
In [4]: df['cumsum'] = df['Num_Albums'].cumsum()
In [5]: df
Out[5]:
index Num_Albums Num_authors cumsum
0 0 10 4 10
1 1 1 5 11
2 2 4 4 15
3 3 7 1000 22
4 4 1 44 23
5 5 3 8 26
Then I apply a condition to the cumsum
column and I extract the corresponding values of the row where the condition is met with a given tolerance:
In [18]: tol = 2
In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()
In [20]: cond
Out[20]:
index Num_Albums Num_authors cumsum
2 2.0 4.0 4.0 15.0
Now, what I want to do is to substitute to the condition 15
in the example, the conditions stored in an array. Check when the condition is met and retrieve not the entire row, but only the value of the column Num_Albums
. Finally, all these retrieved values (one per condition) are stored in an array or list.
Coming from matlab, I would do something like this (I apologize for this mixed matlab/python syntax):
conditions = np.array([10, 15, 23])
for i=0:len(conditions)
retrieved_values(i) = df.where((df['cumsum']>=conditions(i)-tol)&(df['cumsum']<=conditions(i)+tol)).dropna()
So for the data frame above I would get (for tol=0
):
retrieved_values = [10, 4, 1]
I would like a solution that lets me keep the .where
function if possible..