1

I have a csv file the that has a column that a bunch of different columns. the columns thhat i am interested in are the 'Items', 'OrderDate' and 'Units'.

In my IDE I am trying to generate a bar chart of the amount of 'Pencil's sold on each individual 'OrderDate'. What I am trying to do is to look down through the 'Item' columns using pandas and check to see if the item is a pencil and then add it to the graph if it is not then dont do anything.

I think I have made it a bit long winded with the code. i have the coe going down through the 'Iems' column and checking to see if it is a pencil but i can't figure out what to do next.

import pandas as pd
import matplotlib.pyplot as plt


d = {'item' : pd.Series(['Pencil', 'Marker', 'Pencil', 'Headphones', 'Pencil', 'The moon', 'Wish you were here album']), 
 'OrderDate' : pd.Series(['5/15/2020', '5/16/2020', '5/16/2020','5/15/2020', \
                     '5/16/2020', '5/17/2020','5/16/2020','5/16/2020','5/17/2020']),

    'Units' : pd.Series([4, 3, 2, 1, 3, 2, 4, 2, 3])}


df = pd.DataFrame.from_dict(d)
df.plot(kind='bar', x='OrderDate', y='Units')

item_col = df['Item']
pencil_binary = item_col.str.count('Pencil')

for entry in item_col:
        if entry == 'Pencil':
            print("i am a pencil")
        else:
            print("i am not a pencil")

print(df)
plt.plot()
plt.show()
Guillermo Mosse
  • 462
  • 2
  • 14
kitchen800
  • 197
  • 1
  • 12
  • 36
  • I added a reproducible example of a dataframe, please tell me if that's correct. Usually, one wants to add such a thing - it makes life easier for the people who answer :-). See this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Guillermo Mosse May 17 '20 at 19:43

1 Answers1

1

If I understood correctly you want to plot the number of pencils sold per day. For that, you can just filter the dataframe and keep only rows about pencils, and then use a barchart.

Here's a reproducible code that assumes that all rows have different dates:

import pandas as pd
import matplotlib.pyplot as plt


d = {'item' : pd.Series(['Pencil', 'Marker', 'Pencil', 'Headphones', 'Pencil', 'The moon', 'Wish you were here album']), 
 'OrderDate' : pd.Series(['5/15/2020', '5/16/2020', '5/16/2020','5/15/2020', \
                     '5/16/2020', '5/17/2020','5/16/2020','5/16/2020','5/17/2020']),

    'Units' : pd.Series([4, 3, 2, 1, 3, 2, 4, 2, 3])}


df = pd.DataFrame.from_dict(d)

#This dataframe only has pencils
df_pencils = df[df.item == 'Pencil']

df_pencils.groupby('OrderDate').agg('Units').sum().plot(kind='bar', x='OrderDate', y='Units')

df.plot(kind='bar', x='OrderDate', y='Units')

The groupby is used for grouping all rows with the same date, and, for each group, add up the Units sold.

In fact, when you do this:

df_pencils.groupby('OrderDate').agg('Units').sum()

this is the output:

OrderDate
5/15/2020    4
5/16/2020    5
Name: Units, dtype: int64

If you want a one liner, it's:

df[df.item == 'Pencil'].groupby('OrderDate').agg('Units').sum().plot(kind='bar', x='OrderDate', y='Units')
Guillermo Mosse
  • 462
  • 2
  • 14