0

This code counts the words in a column.

df['businesstype'].value_counts()    #value count

My question how can I plot now the 10 or 5 highest counted word in the businesstype column ?

df.head(10)['businesstype'].value_counts().plot.bar()

That works but it counts by the axis my csv data is sorted, not by the value count.

The question is probably easy but I am learning and I haven't found anything on SO that answers my question.

The dataframe looks like this:

Index(['Rang 2014', 'Unnamed: 1', 'Rang 2013','unternehmen' , 'Sitz',
       'Umsatz (Mrd. €)', 'Gewinn/Verlust (Mio. €)', 'Mitarbeiter weltweit',
       'businestype'],
      dtype='object')

I also checked the pd option max rows nothing changed just plotted top and bottom half if I set max rows.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

1

You could simply plot entries 1-5 in your value_count series but this would distort the output in case there are ties with the following entries. A better strategy would be:

import pandas as pd
from matplotlib import pyplot as plt

#number of top entries
nmax = 5

#fake data generation
import numpy as np
np.random.seed(1234)
n = 30
df = pd.DataFrame({"A": np.random.choice(list("XYZUVWKLM"), n), "B": np.random.randint(1, 10, n)})

#create value count series from A
plot_df = df["A"].value_counts()

#plot the two strategies into different panels for better comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#strategy 1: simply plot the first nmax rows
plot_df[:nmax].plot.bar(ax=ax1, rot=0)
ax1.set_title("First nmax entries")

#better approach with strategy 2:
#find value for top nmax entry in case there is a tie with the following entries
val_for_nmax =  plot_df[nmax-1] 
#plot columns that have no less than this value
plot_df[plot_df>=val_for_nmax].plot.bar(ax=ax2, rot=45)
ax2.set_title("Take care of tie values")

plt.show()

Sample output: enter image description here

Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • 1
    thanks that work and was really good exercise to understand for me :) i wish happy christmas – paterson not_Gates Dec 26 '20 at 17:35
  • Glad this solved the issue. We are all here to learn. – Mr. T Dec 26 '20 at 17:36
  • how can i plot with strategy 1 the unternehmen(companyname) counted by bussines type value ? can i just change the x axi ? – paterson not_Gates Dec 26 '20 at 18:18
  • You can't. The question is now different; you asked for a series to plot, so the rest of the dataframe information cannot be found in this plotted series (no information on `B` is contained in it). I suggest you ask a new question with your redefined problem and provide a [reproducible pandas dataframe](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that people can copy-paste to test their strategy. It does not have to be the real dataset, just a toy dataset like in my answer that reflects the characteristics of the real thing. – Mr. T Dec 26 '20 at 18:33
  • ok thanks i remember that for my new question post – paterson not_Gates Dec 26 '20 at 18:42