1
df = pd.DataFrame(np.array([[[1740, 6920, 10120, 14300, 18220, 24500, 41300], 10000, 20000], [[1620, 5840, 12100, 15000, 25260, 26020], 5900, 15200]]),
                   columns=['long_list', 'min', 'max'])

For this dataframe, I'm hoping to create a new column df['part'] that is the part of df['long_list'] that meets the condition of df['min']<df['part']<df['max']. I tried to use a lambda function but struggled with how to use all three columns. So the output would be

df2=pd.DataFrame(np.array([[[1740, 6920, 10120, 14300, 18220, 24500, 41300], 10000, 20000, [10120, 14300, 18220]], [[1620, 5840, 12100, 15000, 25260, 26020], 5900, 15200,[12100, 15000]]]),
                   columns=['long_list', 'min', 'max','part'])
zabop
  • 6,750
  • 3
  • 39
  • 84
lwu29
  • 111
  • 1
  • 6

5 Answers5

4

You can explode the long_list, query on the condition, and group back:

df['part'] = (df.explode('long_list')
                .query('min<long_list<max')
                .groupby(level=0)['long_list'].agg(list)
             )

Output:

    long_list                                          min    max  part
--  -----------------------------------------------  -----  -----  ---------------------
 0  [1740, 6920, 10120, 14300, 18220, 24500, 41300]  10000  20000  [10120, 14300, 18220]
 1  [1620, 5840, 12100, 15000, 25260, 26020]          5900  15200  [12100, 15000]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
3

You could keep everything in pandas by using explode + between and then groupby:

# explode
exploded = df2.explode('long_list')

# filter with  between
mask = exploded['long_list'].between(exploded['min'], exploded['max'])
filtered = exploded[mask]

# group filtered result
df3 = df2.assign(part= filtered.groupby(level=0)['long_list'].agg(list))
print(df3)

Output

                                         long_list  ...                   part
0  [1740, 6920, 10120, 14300, 18220, 24500, 41300]  ...  [10120, 14300, 18220]
1         [1620, 5840, 12100, 15000, 25260, 26020]  ...         [12100, 15000]

[2 rows x 4 columns]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1
import pandas as pd

You can create this new column using apply() and a conditional list comprehension:

df2['part']=df2.apply(lambda row:
                      [each for each in row['long_list'] 
                       if each>row['min'] and each<row['max']],axis=1)

If you really want the result to be a different dataframe, then:

df2=df
df2['part']=df2.apply(lambda row:
                      [each for each in row['long_list'] 
                       if each>row['min'] and each<row['max']],axis=1)
zabop
  • 6,750
  • 3
  • 39
  • 84
0

here is a function to do what you want

df2['part'] = []
for index, row in df2.iterrow():
    part = []
    for num in row['long_list']:
        if num > row['min'] & num < row['max']:
            part = part.append(num)
    df2.loc[index,'part'] = part
Paul Brennan
  • 2,638
  • 4
  • 19
  • 26
0

Working with lists/sequence types within Pandas is not efficient; a list comprehension would do fine, similar to @zabop's answer:

zip_min_max = zip(df["min"], df["max"])
zipped = zip(df.long_list, zip_min_max)

df["part"] = [[val for val in left 
              if right_a < val < right_b]
              for left, (right_a, right_b) 
              in zipped]
sammywemmy
  • 27,093
  • 4
  • 17
  • 31