I would say that pandas
lets you index and slice off of strings and create data frames directly from dictionaries, whereas numpy
is mostly nested lists. Other than that, they are pretty much exactly the same (pandas
is built on top of numpy
). So pandas
"feels" more natural to use for database-like data (e.g. csv, excel, and sql files), whereas numpy
"feels" more natural for numeric processing of data (e.g. signals, images, etc.). Granted, you can do many of the same things in both libraries; you can even create pandas
data frames from numpy
arrays and vice-versa.
One major difference (something to watch out for) is slicing in pandas
is inclusive whereas numpy
is exclusive (i.e. 0:10
in pandas
is "0 up to and including 10" whereas it is "0 up to, but not including 10" in numpy
). This is intuitively due to the fact that since pandas
permits slicing on strings, it doesn't make much sense to slice, say, "up to but not including a column of name x
" (shout out to Corey Schafer for that insight (see about 30 mins in): Python Pandas Tutorial (Part 2)).
Other than that, pandas
utilizes the same slicing, indexing, and fancy indexing notation as numpy
(minus the ability for strings) and the same kinds of "gotcha's" with respect to different operations creating views vs copies of data. (An excellent numpy
tutorial is a Numpy lecture from SciPy 2019 by Alex Chabot-Leclerc).
Ultimately, I would say pandas
is a database analyst's best friend while numpy
is a data scientists friend. Personally, I use pandas
to pull data from the real world, sort it, and preprocess it. Then I convert this data into numpy
arrays where necessary to do more serious/intensive numeric computing. PLEASE NOTE: This is purely opinion. There is no right answer.
That being said, I highly recommend getting to know and understand numpy
first (highly recommend the Alex Chabot-Leclerc video). Afterwards, pandas
will make a lot more sense.