Find the name of the column in a Pandas DF which contains the longest list

Question

Given a Pandas DataFrame with lists stored in several of the columns, is there a simple way to find the column name which contains the longest list for each row?

For example, with this data:

                          positive                 negative          neutral
1   [marvel, moral, bold, destiny]                       []   [view, should]
2                      [beautiful]      [complicated, need]               []
3                      [celebrate]   [crippling, addiction]            [big]

I want to identify "positive" as the column with the longest list for row 1 and "negative" for rows 2 and 3.

I thought I could use str.len() to calculate the list lengths and idmax() to get the column names, but can't figure out how to combine them.

score 15 · Accepted Answer · answered Aug 22 '17 at 21:34

15

IIUC:

In [227]: df.applymap(len).idxmax(axis=1)
Out[227]:
0    positive
1    negative
2    negative
dtype: object

answered Aug 22 '17 at 21:34

MaxU - stand with Ukraine

205,989
36
386
419

Many thanks for this! Quick follow-up: `idxmax()` returns the first index (ie 'positive') when 'positive', 'negative' and 'neutral' are all empty lists. Could this solution be modified to ignore such rows? I've tried replacing the empty lists with NaNs, but `len` then complains: `"object of type 'float' has no len()"`. – user2950747 Aug 23 '17 at 13:30
@user2950747, could you provide a reproducible data set? – MaxU - stand with Ukraine Aug 23 '17 at 16:46
[this sample data](http://www.sharecsv.com/s/9dcf95cd39fa09eec291085eafecbd0c/polarity.csv) has empty lists in the 3rd row from the end, and would be labelled 'positive' by `idxmax()`. – user2950747 Aug 23 '17 at 18:04
@user2950747, do you simply want to drop such rows? – MaxU - stand with Ukraine Aug 23 '17 at 18:06
ah... Yes! I wasn't thinking of it like that, but that would be the simplest approach. Thanks! – user2950747 Aug 23 '17 at 18:08

Alexander · Answer 2 · 2017-08-22T21:39:38.030

5

>>> df.apply(lambda row: row.apply(len).argmax(), axis=1)
0    positive
1    negative
2    negative
dtype: object

edited Aug 22 '17 at 21:39

answered Aug 22 '17 at 21:25

Alexander

105,104
32
201
196

BENY · Answer 3 · 2017-08-22T22:05:39.673

2

Or you can try this ...

df=df.reset_index()
DF=pd.melt(df,id_vars=['index'])
DF['Length']=DF['value'].apply(lambda x : len(x))
DF.sort_values(['index','Length']).drop_duplicates(subset=['index'],keep='last')

edited Aug 22 '17 at 22:05

answered Aug 22 '17 at 21:41

BENY

317,841
20
164
234

Find the name of the column in a Pandas DF which contains the longest list

3 Answers3

Linked