1

I am a Python beginner and getting confused by these different forms of storing data? When should one use which. Also which of these is suitable to store a matrix (and a vector)?

Manoya
  • 41
  • 1
  • In case the current answers haven't hit the spot, this [question](http://stackoverflow.com/questions/176011/python-list-vs-array-when-to-use) on the difference of lists and arrays should suffice. Regarding pandas, they are more for table, (think Excel-like) data manipulation and analysis. Additional reading [here](http://pandas.pydata.org/index.html#what-problem-does-pandas-solve), [here](http://pandas.pydata.org/index.html#library-highlights) and [here](http://www.dyinglovegrape.com/data_analysis/part2/2da2.php). – Reti43 Mar 09 '16 at 05:54

3 Answers3

0
  • list - the original Python way of storing multiple values
  • array - a little used Python module (let's ignore it)
  • numpy array - the closest thing in Python to the arrays, matrices and vectors used in mathematics and languages like MATLAB
  • dataframe, datseries - pandas structures, generally built on numpy, better suited for the kind of data found in tables and databases.

To be more specific, you need to give us an idea of what kinds of problems you need to solve. What kind of data are you using, and what do you need to do with it?

lists can change in size, and can contain a wide mix of elements.

numpy.array is fixed in size, and contains a uniform type of elements. It is multidimensional, and has implemented many mathematical functions.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

Here's a general overview (partial credit to online documentation and Mark Lutz and Wes McKinney O'Reilly books):

  • list: General selection object available in Python's standard library. Lists are positionally ordered collections of arbitrarily typed objects, and have no fixed size. They are also mutable (str for example, are not).

  • numpy.ndarray: Stores a collection of items of the same type. Every item takes up the same size block of memory (not necessarily the case in a list). How each item in the array is to be interpreted is specified by a separate data-type object (dtype, not to be confused with type). Also, differently from lists, ndarrays can't be have items appended in place (i.e. the .append method returns a new array with the appended items, differently from lists). A single ndarray is a vector, an ndarray of same-sized ndarrays is a 2-d array (a.k.a matrix) and so on. You can make arbitrary n-dimensional objects by nesting.

  • pandas.Series: A one-dimensional array-like object containing an array of data (of any dtype) and an associated array of data labels, called its index. It's basically a glorified numpy.ndarray, with labels (stored inside a Series as an Index object) for each items and some handy extra functionality. Also, a Series can contain multiple objects of different dtypes (more like a list).

  • pandas.DataFrame: A collection of multiple Series, forming a table-like object, with a lot of very handy functionality for data analysis.

Gustavo Bezerra
  • 9,984
  • 4
  • 40
  • 48
0

Lists: lists are very flexible and can hold completely heterogeneous, arbitrary data, and they can be appended to very efficiently.

Array: The array.array type, on the other hand, is just a thin wrapper on C arrays. It can hold only homogeneous data, all of the same type, and so it uses only sizeof(one object) * length bytes of memory.

Numpy arrays: However, if you want to do math on a homogeneous array of numeric data, then you're much better off using NumPy, which can automatically vectorize operations on complex multi-dimensional arrays.

Pandas: Pandas provides high level data manipulation tools built on top of NumPy. NumPy by itself is a fairly low-level tool.

Pandas provides a bunch of C or Cython optimized routines that can be faster than numpy "equivalents" (e.g. reading text). For something like a dot product, pandas DataFrames are generally going to be slower than a numpy array

FYI: Taken from different web sources

Henin RK
  • 298
  • 1
  • 2
  • 14