1

For years I've used Pandas on a daily basis and often (but not nearly as frequently) use Numpy. Most of the time I'll do something like:

import pandas as pd
import numpy as np

But there is also the option of using Numpy directly from Pandas:

df['value'] = pd.np.where(df['date'] > '2020-01-01', 1, 0)

Does anyone know if either one of these options is significantly more performant than the other?

elPastor
  • 8,435
  • 11
  • 53
  • 81
  • 2
    This might be relevant [Does python optimize modules when they are imported multiple times?](https://stackoverflow.com/q/296036/15497888) – Henry Ecker Jul 28 '21 at 21:20
  • 1
    `pd.np` is the numpy module, there aren't any performance differences, **it's the same exact module** (well, any time you have an extra attribute lookup there is some cost, but I suspect that isn't what you mean). But you really shouldn't use that, just for code clarity sake – juanpa.arrivillaga Jul 28 '21 at 21:21
  • @HenryEcker - yes, super relevant. Thanks. – elPastor Jul 28 '21 at 21:28

2 Answers2

6

Using pd.np is deprecated:

<ipython-input-631-4160e33c868a>:1: FutureWarning: The pandas.np module is
 deprecated and will be removed from pandas in a future version. 
Import numpy directly instead

You can check this is the same module:

https://github.com/pandas-dev/pandas/blob/master/pandas/__init__.py#L205-L216

Corralien
  • 109,409
  • 8
  • 28
  • 52
2

Both are importing the same library. There should not be any performance differences. It is most likely just an alias for the same code. However, it is possible that some Pandas specific changes were introduced. That is why I would rather import numpy directly. Furthermore, np.array is preferable over pd.np.array because it saves you three characters to type.

Soerendip
  • 7,684
  • 15
  • 61
  • 128
  • Ok. I guess I'd be concerned that it's doubling the effort then on pulling in the same library twice, but maybe that's not how it works. Also, based on @Corralien's answer, sounds like it's kind of a moot point. Thanks for the quick response. – elPastor Jul 28 '21 at 21:15
  • 1
    @elPastor that *isn't* how it works at all. when you `import pandas` the library itself does an `import numpy`. If you `import numpy` again somewhere else, it simply looks for `"numpy"` in `sys.modules` and returns that, i.e. imports are always cached in python – juanpa.arrivillaga Jul 28 '21 at 21:22
  • @juanpa.arrivillaga, excellent explanation. I'm a self-taught programmer and never learned much about the back-end nuts and bolts. Much appreciated. – elPastor Jul 28 '21 at 21:27