I am a Python beginner and getting confused by these different forms of storing data? When should one use which. Also which of these is suitable to store a matrix (and a vector)?
-
In case the current answers haven't hit the spot, this [question](http://stackoverflow.com/questions/176011/python-list-vs-array-when-to-use) on the difference of lists and arrays should suffice. Regarding pandas, they are more for table, (think Excel-like) data manipulation and analysis. Additional reading [here](http://pandas.pydata.org/index.html#what-problem-does-pandas-solve), [here](http://pandas.pydata.org/index.html#library-highlights) and [here](http://www.dyinglovegrape.com/data_analysis/part2/2da2.php). – Reti43 Mar 09 '16 at 05:54
3 Answers
- list - the original Python way of storing multiple values
- array - a little used Python module (let's ignore it)
- numpy array - the closest thing in Python to the arrays, matrices and vectors used in mathematics and languages like MATLAB
- dataframe, datseries - pandas structures, generally built on
numpy
, better suited for the kind of data found in tables and databases.
To be more specific, you need to give us an idea of what kinds of problems you need to solve. What kind of data are you using, and what do you need to do with it?
lists
can change in size, and can contain a wide mix of elements.
numpy.array
is fixed in size, and contains a uniform type of elements. It is multidimensional, and has implemented many mathematical functions.

- 221,503
- 14
- 230
- 353
Here's a general overview (partial credit to online documentation and Mark Lutz and Wes McKinney O'Reilly books):
list
: General selection object available in Python's standard library. Lists are positionally ordered collections of arbitrarily typed objects, and have no fixed size. They are also mutable (str
for example, are not).numpy.ndarray
: Stores a collection of items of the same type. Every item takes up the same size block of memory (not necessarily the case in alist
). How each item in the array is to be interpreted is specified by a separate data-type object (dtype
, not to be confused withtype
). Also, differently from lists,ndarray
s can't be have items appended in place (i.e. the.append
method returns a new array with the appended items, differently fromlist
s). A singlendarray
is a vector, anndarray
of same-sizedndarray
s is a 2-d array (a.k.a matrix) and so on. You can make arbitrary n-dimensional objects by nesting.pandas.Series
: A one-dimensional array-like object containing an array of data (of anydtype
) and an associated array of data labels, called its index. It's basically a glorifiednumpy.ndarray
, with labels (stored inside aSeries
as anIndex
object) for each items and some handy extra functionality. Also, aSeries
can contain multiple objects of differentdtype
s (more like alist
).pandas.DataFrame
: A collection of multipleSeries
, forming a table-like object, with a lot of very handy functionality for data analysis.

- 9,984
- 4
- 40
- 48
Lists: lists are very flexible and can hold completely heterogeneous, arbitrary data, and they can be appended to very efficiently.
Array: The array.array type, on the other hand, is just a thin wrapper on C arrays. It can hold only homogeneous data, all of the same type, and so it uses only sizeof(one object) * length bytes of memory.
Numpy arrays: However, if you want to do math on a homogeneous array of numeric data, then you're much better off using NumPy, which can automatically vectorize operations on complex multi-dimensional arrays.
Pandas: Pandas provides high level data manipulation tools built on top of NumPy. NumPy by itself is a fairly low-level tool.
Pandas provides a bunch of C or Cython optimized routines that can be faster than numpy "equivalents" (e.g. reading text). For something like a dot product, pandas DataFrames are generally going to be slower than a numpy array
FYI: Taken from different web sources

- 298
- 1
- 2
- 14