0

I have this two Dataset to plot:

lwm:

Client name,CTR,mean_diff
customeronewithverylongname,0.08355714285714286,-0.02935714285714286
customertwowithverylongname,0.028471428571428568,-0.001942857142857142
customerthree,0.014371428571428571,0.000700000000000001
customerfourwithverylongname,0.09971428571428573,0.0014285714285714457
customerfive,0.006799999999999999,0.0014999999999999987
customersixQuickSale,0.0396,0.005075000000000003
customerseven,0.16254285714285713,0.0052428571428571324

pwm:

Client name,CTR,mean_diff
customeronewithverylongname,0.11291428571428572,-0.02935714285714286
customertwowithverylongname,0.03041428571428571,-0.001942857142857142
customerthree,0.01367142857142857,0.000700000000000001
customerfourwithverylongname,0.09828571428571428,0.0014285714285714457
customerfive,0.0053,0.0014999999999999987
customersixQuickSale,0.034525,0.005075000000000003
customerseven,0.1573,0.0052428571428571324

I want to plot a series of histograms with the name of the customers on the x axis and CTR on the y, without the xlabels cut off.

I plotted and noticed that xlabels where cut off. So I read this question and solved this way:

plt.subplots_adjust(left=None, bottom=0.15, right=None, top=None, wspace=None, hspace=None)

I tried with different values of bottom :

  • 0.10

  • 0.15

  • 0.17

  • 0.25

  • 0.30

  • 0.35

    and each time the xlabels changed position, i had never the same order for the xlabels.

Instead the histograms are always in the same position.

enter image description here

bottom=0.15

enter image description here bottom=0.25

This is a snippet of my code

#defing the labels of the histograms
#pwm and lwm are the last & penultimate week dataframes 
# with the weekly mean CTR for each customer
    
#defing the labels of the histograms
customer_list=set(lwm['Client name'])

x_pos=list(range(len(customer_list)))
x_lab=customer_list
width=0.4

#defining the y max heigh
max_y=max(zip(lwm['CTR'],pwm['CTR']))

#defining the histograms 
fig,ax =plt.subplots(figsize=(8,6))

plt.bar(x_pos, pwm['CTR'], width, alpha=0.5, color='b',label=x_lab)

plt.bar([p + width for p in x_pos], lwm['CTR'], width, alpha=0.5, color='r', label=x_lab)

#defining the y max height
plt.ylim([0,max(max_y[0],max_y[1])*1.1])

plt.xticks(x_pos,x_lab,rotation=45, rotation_mode="anchor", ha="right") 
plt.title('CTR Bar plot of the last week') 

# Adding the legend and showing the plot
plt.legend(['Penultimate Week CTR','Last Week CTR', ], loc='best')
plt.subplots_adjust(left=None, bottom=0.15, right=None, top=None, wspace=None, hspace=None)

plt.show()

I don't knwow if i have to insert more information about the dataset or if it is fine

I am self tough, I read the documentation here and this question and this to. But i still did not come out with a solution.

Community
  • 1
  • 1
Andrea Ciufo
  • 359
  • 1
  • 3
  • 19
  • It would be a lot easier if you provided the dataframe in form of actual code, `df = pd.DataFrame()` (there is no `Customer Name` in the dataframe you show) and make it consistent with the images (what is the black area?). – ImportanceOfBeingErnest Nov 06 '17 at 18:40
  • @ImportanceOfBeingErnest sure i will modify as quick as i can, the black area is he real name of the customers and the real values of CTR that i can't share. Morover it made a mistake adapting the question for stack from original code is not 'Customer Name" but "Client name" – Andrea Ciufo Nov 06 '17 at 18:46
  • This is the reason to use [mcve]s. Always. regardless of the issue. They help understanding the issue or replicating it, instead of causing confusion. What you want is that someone runs your code, sees the same undesired behaviour that you see, finds the cause of it and files an answer. The best way to achieve this is to provide a [mcve] from the start. – ImportanceOfBeingErnest Nov 06 '17 at 18:49
  • To make it short, don't use a `set`, because a set [is unordered](https://stackoverflow.com/questions/9792664/set-changes-element-order). As in the state this question currently is, it is not useful for any future readers, I would vote to close it. – ImportanceOfBeingErnest Nov 06 '17 at 21:44
  • @ImportanceOfBeingErnest i edited the question again, thanks for the feedback, i changed the csv adapting for stack, simplified the dataframe, i hope that now is suitable :) I get the same problem, but now i think is reproducible for the stack comunity – Andrea Ciufo Nov 07 '17 at 12:57
  • Ok, but I already said that using a `set` will not be a good idea. Did you change that at all? – ImportanceOfBeingErnest Nov 07 '17 at 13:00
  • @ImportanceOfBeingErnest i will test soon, i also read the question you linked, i spend all the time to edit the question. – Andrea Ciufo Nov 07 '17 at 13:02
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158432/discussion-between-andrea-ciufo-and-importanceofbeingernest). – Andrea Ciufo Nov 07 '17 at 16:24

1 Answers1

0

The problem

The xlabels change position due to set.

set by definition, is an unordered collection of distinct hashable object, but this not mean that is randomly ordered. (see here for details https://stackoverflow.com/questions/2860339/can-pythons-set-absence-of-ordering-be-considered-random-order).

So your output is correct.

The Solution

What do you need is to extract the labels based on your specification and then plot it.

For example using:

customer_list=lwm['Client name'] 

Instead of

customer_list=set(lwm['Client name'])

This way you defined the labels in the same order of the x and y values.

As test with bottom value 0.15 and 0.25 you get the following plots: enter image description here

0.15

enter image description here

0.25

Notice that if you want a particular order you have to first sort your data set and after extract the labels for the plot, for example:

test1=lwm.sort_values(by=['CTR'], ascending=False)
test2=pwm.sort_values(by=['CTR'], ascending=False)
customer_list2=test1['Client name']
Andrea Ciufo
  • 359
  • 1
  • 3
  • 19