pandas: Apply operation to all columns without applymap?

Question

I’m trying to figure out if it’s possible to achieve the following without using apply and without using a for loop.

df= [df[x].map(lambda x: len(x) > 5) for x in df.columns]

I’m specifically trying to avoid apply and applymap, and look for a vectorised solution. All values in DF are strings. I’m using the above as a mask later on.

The fastest I've found is:

df1 = [df[x].map(lambda x: len(x) > 5) for x in df.columns]
df2 = df[pd.concat(df1, axis=1, keys=[s.name for s in df1]).any(1)]

It's faster than:

df[(df.applymap(len) > 5).any(axis=1)]

What are you exactly trying to achieve? And can you add some example data for us so we can reproduce an answer? — Erfan, Jun 01 '19 at 16:09
`df(df.applymap(len) > 5).any(axis=1)]` is actually not a bad solution. Strings are inherently not vectorizable so these solutions are all comparable. Another one is `df.apply(lambda x: x.str.len() > 5)` which applies the comparison column-wise. — cs95, Jun 01 '19 at 16:14
@cs95 I’m getting significant speed improvements with without applymap and apply, that’s why I asked. — zerohedge, Jun 01 '19 at 16:19

score 4 · Accepted Answer · answered Jun 01 '19 at 16:08

4

How about vectorize, at least it should be slightly faster than apply , about the comparision of for loop , it all depends on your data size and shape . Link, Link

np.vectorize(len)(df.values)>5

answered Jun 01 '19 at 16:08

BENY

317,841
20
164
234

@zerohedge that is as I mentioned , it depends on your real df size :-) – BENY Jun 01 '19 at 20:46
`df[np.vectorize(len)(df.values)>5]` is returning the first row for all rows in `df`, how can I use it for my purposes? – zerohedge Jun 02 '19 at 15:55
this seems to expand each row to multiple rows (per column), I need an any check that only returns the the row (and only one per row) for every row where the condition is met. – zerohedge Jun 02 '19 at 16:13
@zerohedge you can add np.any():-) – BENY Jun 02 '19 at 16:17
where do I put that though? The problem is that's expanding each row to multiple row, per column. For now I'm using `df1 = a[np.vectorize(len)(a.values) > 5] df2 = df1.groupby(df1.index).first()` – zerohedge Jun 02 '19 at 16:20
1

@zerohedge in your case `np.all(np.vectorize(len)(df.values)>5,1)` – BENY Jun 02 '19 at 16:24

pandas: Apply operation to all columns without applymap?

1 Answers1