3

I've developed a perl script that manipulates around data and gives me a final csv file. Unfortunately, the package for graphs and charts in perl are not supported on my system and I'm not able to install them due to work restrictions. So I want to try and take the csv file and put together something in Python to generate a mixed graph. I want the first column to be the labels on the x-axis. The next three columns to be bar graphs. The fourth column to be a line across the x-axis.

Here is sample data:

Name      PreviousWeekProg     CurrentWeekProg     ExpectedProg     Target
Dan              94                   92                 95           94
Jarrod           34                   56                 60           94
Chris            45                   43                 50           94
Sam              89                   90                 90           94
Aaron            12                   10                 40           94
Jenna            56                   79                 80           94
Eric             90                   45                 90           94

I am looking for a graph like this: enter image description here

I did some researching but being as clueless as I am in python, I wanted to ask for some guidance on good modules to use for mixed charts and graphs in python. Sorry, if my post is vague. Besides looking at other references online, I'm pretty clueless about how to go about this. Also, my version of python is 3.8 and I DO have matplotlib installed (which is what i was previously recommended to use).

  • Hi, did my answer help with your question? – Shaun Lowis Dec 12 '19 at 02:01
  • 1
    @ShaunLowis It was very helpful but I'm still trying to figure out some basics with it :( For example, I'm getting errors when trying to read the csv to begin with. Seems to be something fundamental but I haven't figured it out yet. – programminglearner Dec 12 '19 at 19:28
  • That's fair, you can mark my answer as correct and then ask another question related to your errors and tag me in a comment and I could try and help? – Shaun Lowis Dec 12 '19 at 20:01

3 Answers3

2

Since the answer by @ShaunLowis doesn't include a complete example I thought I'd add one. As far as reading the .csv file goes, the best way to do it in this case is probably to use pandas.read_csv() as the other answer points out. In this example I have named the file test.csv and placed it in the same directory from which I run the script

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("./test.csv")
names = df['Name'].values
x = np.arange(len(names))
w = 0.3
plt.bar(x-w, df['PreviousWeekProg'].values, width=w, label='PreviousWeekProg')
plt.bar(x, df['CurrentWeekProg'].values, width=w, label='CurrentWeekProg')
plt.bar(x+w, df['ExpectedProg'].values, width=w, label='ExpectedProg')
plt.plot(x, df['Target'].values, lw=2, label='Target')
plt.xticks(x, names)
plt.ylim([0,100])
plt.tight_layout()
plt.xlabel('X label')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, ncol=5)
plt.savefig("CSVBarplots.png", bbox_inches="tight")
plt.show()

enter image description here


Explanation

From the pandas docs for read_csv() (arguments extraneous to the example excluded),

pandas.read_csv(filepath_or_buffer)

Read a comma-separated values (csv) file into DataFrame.

filepath_or_buffer: str, path object or file-like object

Any valid string path is acceptable. The string could be a URL. [...] If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO.

In this example I am specifying the path to the file, not a file object.

names = df['Name'].values

This extracts the values in the 'Name' column and converts them to a numpy.ndarray object. In order to plot multiple bars with one name I reference this answer. However, in order to use this method, we need an x array of floats of the same length as the names array, hence

x = np.arange(len(names))

then set a width for the bars and offset the first and third bars accordingly, as outlines in the referenced answer

w = 0.3
plt.bar(x-w, df['PreviousWeekProg'].values, width=w, label='PreviousWeekProg')
plt.bar(x, df['CurrentWeekProg'].values, width=w, label='CurrentWeekProg')
plt.bar(x+w, df['ExpectedProg'].values, width=w, label='ExpectedProg')

from the matplotlib.pyplot.bar page (unused non-positional arguments excluded),

matplotlib.pyplot.bar(x, height, width=0.8)

The bars are positioned at x [...] their dimensions are given by width and height.

Each of x, height, and width may either be a scalar applying to all bars, or it may be a sequence of length N providing a separate value for each bar.

In this case, x and height are sequences of values (different for each bar) and width is a scalar (the same for each bar).

Next is the line for target which is pretty straightforward, simply plotting the x values created earlier against the values from the 'Target' column

plt.plot(x, df['Target'].values, lw=2, label='Target')

where lw specifies the linewidth. Disclaimer: if the target value isn't the same for each row of the .csv this will still work but may not look exactly how you want it to as is.

The next two lines,

plt.xticks(x, names)
plt.ylim([0,100])

just add the names below the bars at the appropriate x positions and then set the y limits to span the interval [0, 100].

The final touch here is to add the legend below the plot,

plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True)

see this answer for more on how to adjust this as desired.

William Miller
  • 9,839
  • 3
  • 25
  • 46
  • 1
    This was perfect. Your explanation was very detailed and on-point! Will take an hour for me to award the bounty. – programminglearner Dec 18 '19 at 22:14
  • @sfr Thanks, I’m glad it helped – William Miller Dec 18 '19 at 22:15
  • I have an additional question. Instead of using plt.show is there any way to have it save to an image (jpeg or png) locally? – programminglearner Dec 18 '19 at 22:46
  • @sfr You want to use `plt.savefig("filename.png")`. Additionally I recommend using `bbox_inches='tight'` to remove the generous whitespace added around the output. (So in full `plt.savefig("filename.png", bbox_inches='tight'`) – William Miller Dec 18 '19 at 22:48
  • Worked perfectly! For the part of the code that places the legend, for some reason it places it, overlapping the chart itself. I tried to change the location to lower center and I read online that you can use "best" for the script to determine where it would best fit, however neither worked. The legend is still overlapping the chart or the x-axis. – programminglearner Dec 18 '19 at 23:08
  • @sfr You can adjust the `bbox_to_anchor` (bounding box to anchor) argument in `plt.legend()`, the second value in the tuple controls the relative `y` position. Change to something like `bbox_to_anchor=(0.5, -0.1)` and you should see the legend shift down. You can play around with it until you get the legend where you want it – William Miller Dec 18 '19 at 23:12
  • This does not work. I believe because I am using plt.tight_layout(). I had to add this in to fit the entire chart out of the frame. Otherwise, it cuts out part of the x-axis label from the picture. – programminglearner Dec 18 '19 at 23:17
  • @sfr I suspected that you might be using that. I need to consult my reference on how to adjust legend location with `plt.tight_layout()`... give me a few moments – William Miller Dec 18 '19 at 23:19
  • @sfr I have modified my example to accommodate `plt.tight_layout()` with a legend and `xlabel` as well as include the correct `plt.savefig()` method. It turns out that `bbox_inches='tight'` is required to save the figure with the proper extent in this case. – William Miller Dec 18 '19 at 23:43
  • I was still having issues with but I added ncol=5 for the legend and I was able to get a horizontal legend to fit the plot. – programminglearner Dec 19 '19 at 00:05
  • @sfr Not sure how that got deleted when I copied my code over, thanks for pointing that out – William Miller Dec 19 '19 at 00:09
  • I do have two last questions. (sorry, trying to get the formatting right.) I know the width controls the space between the sets of bars but is there any way to make micro spaces between each of the three bars in each set? I tried a few things but I'm only able to adjust space between bar sets. also, how can i expend the x axis so the final chart isnt so compact. i want it a bit spread out however the figure seems limited due to tight_layout (yet that is the only way i can get it to entire show the x-axis in the frame) – programminglearner Dec 19 '19 at 00:39
  • @sfr That’s pretty straightforward, to create a small gap between bars you just need to set the `width` parameter to something smaller than the offset (`w`) so instead of `plt.bar(x-w, width=w)` do `plt.bar(x-w, width=w*0.9)`, that will make the bar 90% as wide as the offset, leaving a gap of 10% between bars (do that for `plt.bar(x,...` and `plt.bar(x+w,...` as well). You can adjust the `x` axis limits in the same fashion as the `y` axis by doing `plt.xlim([x[0]-1, x[-1]+1])`. This will 'pad' each end of the axis by about the width of a bar triplet. – William Miller Dec 19 '19 at 00:51
  • so the first part worked perfectly, thanks! as for the second part, what I'm trying to do is space out each of the ticks on the x-axis. By doing plt.xlim([x[0]-1, x[-1]+1]), it just adds extra space to the start of the plot and the end. – programminglearner Dec 19 '19 at 00:57
  • @sfr My mistake, you can space them out more by multiplying the `x` array by some amount when you create it, i.e. `x =np.arange(len(names))*2` would position the bars at `[0, 2, 4, 6, 8, 10, 12]` instead of `[0, 1, 2, 3, 4, 5, 6]` but keep their width the same. Should give the plot a more spaced out appearance – William Miller Dec 19 '19 at 01:03
  • so it is creating that appearance by thinning the bars, i believe. because no matter what, the image version of the chart is staying the exact same size (a square, instead of a rectangle image with a long x-axis). is this the only way python can print out images? – programminglearner Dec 19 '19 at 01:05
  • @sfr If you’re trying to change the size of the figure itself just put `plt.figure(figsize=(8,4)` before all the other plotting code and change `(8,4)` as needed (it is x width, y width) to your needs. There are many other ways to do this but that’s probably easiest in your case – William Miller Dec 19 '19 at 01:10
  • 1
    I just gave that a shot right before reading your comment and it worked great. thanks again for all your help. – programminglearner Dec 19 '19 at 01:11
1

I would recommend reading in your .csv file using the 'read_csv()' utility of the Pandas library like so:

import pandas as pd

df = pd.read_csv(filepath)

This stores the information in a Dataframe object. You can then access your columns by:

my_column = df['PreviousWeekProg']

After which you can call:

my_column.plot(kind='bar')

On whichever column you wish to plot. Configuring subplots is a different beast, for which I would recommend using matplotlib's pyplot .

I would recommend starting with this figure and axes object declarations, then going from there:

fig = plt.figure()
ax1 = plt.subplot()
ax2 = plt.subplot()
ax3 = plt.subplot()
ax4 = plt.subplot()

Where you can read more about adding in axes data here.

Let me know if this helps!

Shaun Lowis
  • 283
  • 2
  • 18
  • If you are struggling with the implementation, this post should help if I was unclear about anything: https://stackoverflow.com/questions/33631163/how-to-put-the-legend-on-first-subplot-of-seaborn-facetgrid?rq=1 – Shaun Lowis Dec 11 '19 at 02:31
1

You can use the parameter hue in the package seaborn. First, you need to reshape you data set with the function melt:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df1 = df.melt(id_vars=['Name', 'Target'])
print(df1.head(10))

Output:

     Name  Target          variable  value
0     Dan      94  PreviousWeekProg     94
1  Jarrod      94  PreviousWeekProg     34
2   Chris      94  PreviousWeekProg     45
3     Sam      94  PreviousWeekProg     89
4   Aaron      94  PreviousWeekProg     12
5   Jenna      94  PreviousWeekProg     56
6    Eric      94  PreviousWeekProg     90
7     Dan      94   CurrentWeekProg     92
8  Jarrod      94   CurrentWeekProg     56
9   Chris      94   CurrentWeekProg     43

Now you can use the column 'variable' as your hue parameter in the function barplot:

fig, ax = plt.subplots(figsize=(10, 5)) # set the size of a figure
sns.barplot(x='Name', y='value', hue='variable', data=df1) # plot

xmin, xmax = plt.xlim() # get x-axis limits
ax.hlines(y=df1['Target'], xmin=xmin, xmax=xmax, color='red') # add multiple lines
# or ax.axhline(y=df1['Target'].max()) to add a single line

sns.set_style("whitegrid") # use the whitegrid style
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.06), ncol=4, frameon=False) # move legend to the bottom
plt.title('Student Progress', loc='center') # add title
plt.yticks(np.arange(df1['value'].min(), df1['value'].max()+1, 10.0)) # change tick frequency
plt.xlabel('') # set xlabel
plt.ylabel('') # set ylabel

plt.show() # show plot

enter image description here

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73