Generating Pivot Tables in Python - Pandas? Numpy? Xlrd? from csv

Question

I have been searching for hours, literally the entire day on how to generate a pivot table in Python. I am very new to python so please bear with me.

What I want is to take a csv file, extract the first column and generate a pivot table using the count or frequency of the numbers in that column, and sort descending

import pandas

import numpy 


from numpy import recfromtxt
a = recfromtxt('1.csv', skiprows=1, usecols=0, delimiter=',')


print a

^ what i get here is a list of the first column [2 2 2 6 7]

What i need is an export of 2 columns

2--3

6--1

7--1

see these: http://pandas.pydata.org/pandas-docs/dev/10min.html, http://pandas.pydata.org/pandas-docs/dev/io.html, http://pandas.pydata.org/pandas-docs/dev/reshaping.html — Jeff, Oct 16 '13 at 22:24
The Python `collections.Counter` class makes this very easy if you have access to it (2.7 or later) and don't specifically need your count array to be a numpy array. You can generate one from `collections.Counter(np.nditer(a))`. If you need `numpy` output and your data is nonnegative integers, it looks like `bincount` would be a start: http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html#numpy.bincount — Peter DeGlopper, Oct 16 '13 at 22:30
@PeterDeGlopper In numpy, if your items are not nonnegative integers, you would do something like `unq, _ = np.unique(a, reverse_index=True); cnts = np.bincount(_)`, and now `unq` and `cnts` are your two columns. — Jaime, Oct 16 '13 at 22:47
@Jaime - `return_inverse` rather than `reverse_index`, right? And you might need `return_index` as well to make it easier to match the counts up against the array entries. Still, that's clever. — Peter DeGlopper, Oct 16 '13 at 23:03
@PeterDeGlopper Yes, exactly, `return_inverse`... I have written somewhere around here that that will eventually become a standard feature of `np.bincount`, because I find myself writing those do lines of code much too often. — Jaime, Oct 16 '13 at 23:08
This is duplicate, see answer using pandas in: http://stackoverflow.com/questions/12860421/python-pandas-pivot-table-with-aggfunc-count-unique-distinct — denfromufa, Oct 17 '13 at 03:43

cedbeu · Answer 1 · 2013-10-16T22:49:03.723

1

Have you had a look here?

https://pypi.python.org/pypi/pivottable

Otherwise, from you example, you might just use list comprehensions:

>>> l = [2,2,2,6,7]
>>> [(i, l.count(i)) for i in set(l)]

[
    (2,3),
    (6,1),
    (7,1)
]

Or even dictionary comprehensions, depending on what you need:

>>> l = [2,2,2,6,7]
>>> {i:l.count(i) for i in set(l)}

{
    2: 3,
    6: 1,
    7: 1
}

edit (suggestions from @Peter DeGlopper)

Another more efficient way using collections.Counter (read comments below):

>>> from collections import Counter
>>> l = [2,2,2,6,7]
>>> Counter(l)

Counter({2: 3, 6: 1, 7: 1})

edited Oct 16 '13 at 22:49

answered Oct 16 '13 at 22:26

cedbeu

1,919
14
24

That's going to have quadratic performance with the list length, since `count` has to traverse the whole list each time - you can do much better using `collections.Counter` or a `defaultdict` that accumulates the running total in a single pass of the list. – Peter DeGlopper Oct 16 '13 at 22:30
It's true, though for the sake of simplicity (for a python beginner), if the list is not really huge, I think most of modern machines can handle that without any troubles. If the perf is a critical issue, then indeed, you're perfectly right, there are many way of optimizing this. – cedbeu Oct 16 '13 at 22:36

Generating Pivot Tables in Python - Pandas? Numpy? Xlrd? from csv

1 Answers1