1

I have been searching for hours, literally the entire day on how to generate a pivot table in Python. I am very new to python so please bear with me.

What I want is to take a csv file, extract the first column and generate a pivot table using the count or frequency of the numbers in that column, and sort descending

import pandas

import numpy 


from numpy import recfromtxt
a = recfromtxt('1.csv', skiprows=1, usecols=0, delimiter=',')


print a

^ what i get here is a list of the first column [2 2 2 6 7]

What i need is an export of 2 columns

2--3

6--1

7--1

neil4real
  • 51
  • 7
  • see these: http://pandas.pydata.org/pandas-docs/dev/10min.html, http://pandas.pydata.org/pandas-docs/dev/io.html, http://pandas.pydata.org/pandas-docs/dev/reshaping.html – Jeff Oct 16 '13 at 22:24
  • The Python `collections.Counter` class makes this very easy if you have access to it (2.7 or later) and don't specifically need your count array to be a numpy array. You can generate one from `collections.Counter(np.nditer(a))`. If you need `numpy` output and your data is nonnegative integers, it looks like `bincount` would be a start: http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html#numpy.bincount – Peter DeGlopper Oct 16 '13 at 22:30
  • 1
    @PeterDeGlopper In numpy, if your items are not nonnegative integers, you would do something like `unq, _ = np.unique(a, reverse_index=True); cnts = np.bincount(_)`, and now `unq` and `cnts` are your two columns. – Jaime Oct 16 '13 at 22:47
  • @Jaime - `return_inverse` rather than `reverse_index`, right? And you might need `return_index` as well to make it easier to match the counts up against the array entries. Still, that's clever. – Peter DeGlopper Oct 16 '13 at 23:03
  • @PeterDeGlopper Yes, exactly, `return_inverse`... I have written somewhere around here that that will eventually become a standard feature of `np.bincount`, because I find myself writing those do lines of code much too often. – Jaime Oct 16 '13 at 23:08
  • This is duplicate, see answer using pandas in: http://stackoverflow.com/questions/12860421/python-pandas-pivot-table-with-aggfunc-count-unique-distinct – denfromufa Oct 17 '13 at 03:43

1 Answers1

1

Have you had a look here?

https://pypi.python.org/pypi/pivottable

Otherwise, from you example, you might just use list comprehensions:

>>> l = [2,2,2,6,7]
>>> [(i, l.count(i)) for i in set(l)]

[
    (2,3),
    (6,1),
    (7,1)
]

Or even dictionary comprehensions, depending on what you need:

>>> l = [2,2,2,6,7]
>>> {i:l.count(i) for i in set(l)}

{
    2: 3,
    6: 1,
    7: 1
}

edit (suggestions from @Peter DeGlopper)

Another more efficient way using collections.Counter (read comments below):

>>> from collections import Counter
>>> l = [2,2,2,6,7]
>>> Counter(l)

Counter({2: 3, 6: 1, 7: 1})
cedbeu
  • 1,919
  • 14
  • 24
  • That's going to have quadratic performance with the list length, since `count` has to traverse the whole list each time - you can do much better using `collections.Counter` or a `defaultdict` that accumulates the running total in a single pass of the list. – Peter DeGlopper Oct 16 '13 at 22:30
  • It's true, though for the sake of simplicity (for a python beginner), if the list is not really huge, I think most of modern machines can handle that without any troubles. If the perf is a critical issue, then indeed, you're perfectly right, there are many way of optimizing this. – cedbeu Oct 16 '13 at 22:36