4

Why is select_dtypes so slow?

%timeit [col for col in df.columns if np.issubdtype(df[col].dtype, np.number)]

453 microsecs per loop

%timeit df.select_dtypes(include=[np.number])

4.58 secs per loop

simon
  • 2,561
  • 16
  • 26
  • 1
    I'd post an issue on [github](https://github.com/pandas-dev/pandas/issues) as this seems to be really inefficient, I get 1.59ms vs 45.7 us when comparing select_dtypes vs list comprehension – EdChum Nov 04 '16 at 16:49
  • 1
    AFAIK, the idea behind `select_dtypes()` is to select a subset of DF (not subset of columns), so it returns data (all rows), which of course takes time... – MaxU - stand with Ukraine Nov 04 '16 at 17:25
  • 2
    Why does that have to take time? It does not have to actually move any data just return pointers to columns. – simon Nov 04 '16 at 20:01

0 Answers0