1

Suppose I have the following dataframe:

df = pd.DataFrame({'AB': [['ab', 'ef', 'bd'], ['abc', 'efg', 'cd'], ['bd', 'aaaa']],
                   'CD': [['xy', 'gh'], ['trs', 'abc'], ['ab', 'bcd', 'efg']],  
                   'EF': [['uxyz', 'abc'], ['peter', 'adam'], ['ab', 'zz', 'bd']]})

df

               AB              CD             EF
0    [ab, ef, bd]        [xy, gh]    [uxyz, abc]
1  [abc, efg, cd]      [trs, abc]  [peter, adam]
2      [bd, aaaa]  [ab, bcd, efg]   [ab, zz, bd]

I want to extract the column which contains a sorted list. In this case it is CD, since ['ab','bcd','efg'] is sorted in ascending order. It is guaranteed that no list is empty and it will contain at least two elements. I am stuck at how to combine applymap and sort function together using Pandas ? I tried to come up with the solution from here but couldn't figure out a way to combine applymap and sort.

I am working in Python 2.7 and pandas

Zero
  • 74,117
  • 18
  • 147
  • 154
Gambit1614
  • 8,547
  • 1
  • 25
  • 51

3 Answers3

3

Use applymap with sorted

In [2078]: df.applymap(sorted).eq(df).any()
Out[2078]:
AB    False
CD     True
EF    False
dtype: bool

Get result into a list

In [2081]: cond = df.applymap(sorted).eq(df).any()

In [2082]: cond[cond].index
Out[2082]: Index([u'CD'], dtype='object')

In [2083]: cond[cond].index.tolist()
Out[2083]: ['CD']

If you need specific columns with data

In [2086]: df.loc[:, cond]
Out[2086]:
               CD
0        [xy, gh]
1      [trs, abc]
2  [ab, bcd, efg]

And, get first of column name

In [2092]: cond[cond].index[0]
Out[2092]: 'CD'
Zero
  • 74,117
  • 18
  • 147
  • 154
3

Use applymap and for filter columns loc:

df = df.loc[:, df.applymap(lambda x: sorted(x) == x).any()]
print (df)
               CD
0        [xy, gh]
1      [trs, abc]
2  [ab, bcd, efg]

And for column names:

a = df.applymap(lambda x: sorted(x) == x).any()
print (a)
AB    False
CD     True
EF    False
dtype: bool

L = a.index[a].tolist()
print (L)
['CD']

Timings

Conclusion - df.applymap(lambda x: sorted(x) == x) is approx. same as df.applymap(sorted) == df:

#3k rows
df = pd.concat([df]*1000).reset_index(drop=True)

In [134]: %timeit df.applymap(lambda x: sorted(x) == x)
100 loops, best of 3: 8.08 ms per loop

In [135]: %timeit df.applymap(sorted).eq(df)
100 loops, best of 3: 9.96 ms per loop

In [136]: %timeit df.applymap(sorted) == df
100 loops, best of 3: 9.84 ms per loop

In [137]: %timeit df.applymap(lambda x: (np.asarray(x[:-1]) <= np.asarray(x[1:])))
10 loops, best of 3: 62 ms per loop

#30k rows
df = pd.concat([df]*10000).reset_index(drop=True)

In [126]: %timeit df.applymap(lambda x: sorted(x) == x)
10 loops, best of 3: 77.5 ms per loop

In [127]: %timeit df.applymap(sorted).eq(df)
10 loops, best of 3: 81.1 ms per loop

In [128]: %timeit df.applymap(sorted) == df
10 loops, best of 3: 75.7 ms per loop

In [129]: %timeit df.applymap(lambda x: (np.asarray(x[:-1]) <= np.asarray(x[1:])))
1 loop, best of 3: 617 ms per loop

#300k rows
df = pd.concat([df]*100000).reset_index(drop=True)

In [131]: %timeit df.applymap(lambda x: sorted(x) == x)
1 loop, best of 3: 750 ms per loop

In [132]: %timeit df.applymap(sorted).eq(df)
1 loop, best of 3: 801 ms per loop

In [133]: %timeit df.applymap(sorted) == df
1 loop, best of 3: 744 ms per loop

In [134]: %timeit df.applymap(lambda x: (np.asarray(x[:-1]) <= np.asarray(x[1:])))
1 loop, best of 3: 6.25 s per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Checking for sortedness without sorting.

is_sorted = lambda x: (np.asarray(x[:-1]) <= np.asarray(x[1:])).all()
df.applymap(is_sorted).any()

AB    False
CD     True
EF    False
dtype: bool
piRSquared
  • 285,575
  • 57
  • 475
  • 624