0

I'm quite new to pandas programming. I have a file excel that I put into a dataframe and I was trying to do a group by with a count() for an attribute like in the code below and afterwards to show in a plotbar the frequency of these items I've grouped (y axis the frequency, x axis the item) :

red_whine=pd.read_csv('winequality-red.csv',header=1,sep=';',names=['fixed_acidity','volatile_acidity',...])
frequency=red_whine.groupby('quality')['quality'].count()
pdf=pd.DataFrame(frequency)
print(pdf[pdf.columns[0]])

but if I do this, this code will print me the result below like if it was a unique column:

quality
3     10
4     53
5    680
6    638
7    199
8     18

How can I keep the two columns separated?

Seba92
  • 446
  • 1
  • 4
  • 17
  • could you post a desired output, because it's not clear what do you want to achieve? – MaxU - stand with Ukraine May 03 '16 at 21:48
  • I would like to have only the first column with pdf.columns[0] and the second one with pdf.columns[1], while here I have both just writing pdf.columns[0] – Seba92 May 03 '16 at 21:52
  • 1
    Try always to provide a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) when asking questions. In case of _pandas_ questions please provide sample _input_ and _output_ data sets (5-7 rows in CSV/dict/JSON/Python code format _as text_, so one could use it when coding an answer for you). This will help to avoid _situations_ like: `your code isn't working for me` or `it doesn't work with my data`, etc. – MaxU - stand with Ukraine May 03 '16 at 21:56
  • The input can be taken from "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", the output is the one that I posted. – Seba92 May 04 '16 at 08:05

1 Answers1

2
import urllib2  # By recollection, Python 3 uses import urllib 

target_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine = pd.read_csv(urllib2.urlopen(target_url), sep=';')

vc = wine.quality.value_counts()
>>> vc
5    681
6    638
7    199
4     53
8     18
3     10
Name: quality, dtype: int64

>>> vc.index
Int64Index([5, 6, 7, 4, 8, 3], dtype='int64')

>>> vc.values
array([681, 638, 199,  53,  18,  10])

For plotting, please refer to this: Plotting categorical data with pandas and matplotlib

Community
  • 1
  • 1
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • yes, in this way it works but I wanted to understand why it didn't work in the other way because I will need to do also other aggregates operations like sum, avg.. – Seba92 May 04 '16 at 08:03
  • 1
    You've created a Series which has both an index and values. They are separate, but Series and Dataframes are always displayed with their index. – Alexander May 05 '16 at 05:58