1

I want to plot a histogram for my data which looks something like this :

id  date   purchase  visit

id1 date1    $10      0 

id1 date1    $50      0

id1 date2    $30      1

id2 date1    $10      0

id2 date2    $10      1

id2 date3  $10        2  

Basically each row is one transaction for a customer in a day. Visit column keeps track of previous visits per day in a cumulative fashion. Each time same customer visits on new day it increases by 1.

How can I create a histogram that shows the distinct visits per customer. A distinct visit is defined as a customer visiting the store on two separate days.

Total unique customers = 1215 Total rows = 1135067

I tried to run the following :

import random
import numpy
from matplotlib import pyplot

bins = df['visit'].unique() 
uniq_id = df['id'].unique()

pyplot.hist(df['date'], bins, alpha=0.5, label=df['id']) 

pyplot.legend(loc='upper right')
pyplot.show()

Taken intution from plotting multiple histograms here : Plot two histograms at the same time with matplotlib

Saleem Ahmed
  • 2,719
  • 2
  • 18
  • 31
  • So what exactly is your problem? Do you get an error or does your plot not look right? – BenT Apr 14 '19 at 01:07
  • The process never gave an output. It just kept on running for a couple of hours .. – Saleem Ahmed Apr 14 '19 at 02:38
  • What is the length of bins? Try removing your label to see if that helps with speed. I can't produce your error without an example tor un. – BenT Apr 15 '19 at 19:49

0 Answers0