3

From a list of values, I try to identify any sequential pair of values whose sum exceeds 10

a = [1,9,3,4,5]

...so I wrote a for loop...

values = []
for i in range(len(a)-2):
    if sum(a[i:i+2]) >10:
        values += [a[i:i+2]]

...which I rewritten as a list comprehension...

values = [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >10]

Both produce same output:

values = [[1,9], [9,3]]

My question is how best may I apply the above list comprehension in a DataFrame.

Here is the sample 5 rows DataFrame

import pandas as pd
df = pd.DataFrame({'A': [1,1,1,1,0], 
                   'B': [9,8,3,2,2],
                   'C': [3,3,3,10,3],
                   'E': [4,4,4,4,4],
                   'F': [5,5,5,5,5]})
df['X'] = df.values.tolist()

where: - a is within a df['X'] which is a list of values Columns A - F

df['X'] = [[1,9,3,4,5],[1,8,3,4,5],[1,3,3,4,5],[1,2,10,4,5],[0,2,3,4,5]]
  • and, result of the list comprehension is to be store in new column df['X1]

Desired output is:

df['X1'] = [[[1,9], [9,3]],[[8,3]],[[NaN]],[[2,10],[10,4]],[[NaN]]]

Thank you.

denpy
  • 279
  • 2
  • 10
  • 2
    Please set up a small sample dataframe for the input and one for the desired output. – timgeb May 31 '20 at 15:38
  • You can check out [this reference post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) if you have trouble setting up sample dataframes. – timgeb May 31 '20 at 15:49
  • Thanks. Added sample dataframe – denpy May 31 '20 at 15:50
  • That sample dataframe is a bit too large to work with. Can you limit it to six rows with deterministic (not randomly generated) values, please? Also, please post the **result** you want for that sample. Thanks. – timgeb May 31 '20 at 15:50
  • By the way I need to see the desired result here because it is not clear how you want the new column to fit into the original dataframe. Because your code can produce a list shorter than the original column. – timgeb May 31 '20 at 15:55
  • Just to store the results in a new column – denpy May 31 '20 at 16:01
  • What if the result is shorter than the other columns? – timgeb May 31 '20 at 16:09
  • NaN. df['X1'] is a list of list. – denpy May 31 '20 at 16:11
  • Let me re-create the sample dataframe..... – denpy May 31 '20 at 16:13
  • @denpy In your desired output the `values >= 10` are included but in your list comprehension you specify `values > 10` should be included? – Shubham Sharma May 31 '20 at 17:03

1 Answers1

6

You could use pandas apply function, and put your list comprehension in it.

df = pd.DataFrame({'A': [1,1,1,1,0], 
                   'B': [9,8,3,2,2],
                   'C': [3,3,3,10,3],
                   'E': [4,4,4,4,4],
                   'F': [5,5,5,5,5]})

df['x'] = df.apply(lambda a: [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >= 10], axis=1)

#Note the axis parameters tells if you want to apply this function by rows or by columns, axis = 1 applies the function to each row.

This will give the output as stated in df['X1']

monte
  • 1,482
  • 1
  • 10
  • 26