I want to plot a histogram for my data which looks something like this :
id date purchase visit
id1 date1 $10 0
id1 date1 $50 0
id1 date2 $30 1
id2 date1 $10 0
id2 date2 $10 1
id2 date3 $10 2
Basically each row is one transaction for a customer in a day. Visit column keeps track of previous visits per day in a cumulative fashion. Each time same customer visits on new day it increases by 1.
How can I create a histogram that shows the distinct visits per customer. A distinct visit is defined as a customer visiting the store on two separate days.
Total unique customers = 1215 Total rows = 1135067
I tried to run the following :
import random
import numpy
from matplotlib import pyplot
bins = df['visit'].unique()
uniq_id = df['id'].unique()
pyplot.hist(df['date'], bins, alpha=0.5, label=df['id'])
pyplot.legend(loc='upper right')
pyplot.show()
Taken intution from plotting multiple histograms here : Plot two histograms at the same time with matplotlib