I am new to python and have looked at a number of similar problems on SO, but cannot find anything quite like the problem that I have and am therefore putting it forward:
I have an .xlsx
dataset with data spread across eight worksheets and I want to do the following:
- sum the values in the 14th column in each worksheet (the format, layout and type of data (scores) is the same in column 14 across all worksheets)
- create a new worksheet with all summed values from column 14 in each worksheet
- sort the totaled scores from highest to lowest
- plot the summed values in a bar chart to compare
I cannot even begin this process because I am struggling at the first point. I am using pandas
and am having trouble reading the data from one specific worksheet - I only seem to be able to read the data from the first worksheet only (I print the outcome to see what my system is reading in).
My first attempt produces an `Empty DataFrame':
import pandas as pd
y7data = pd.read_excel('Documents\\y7_20161128.xlsx', sheetname='7X', header=0,index_col=0,parse_cols="Achievement Points",convert_float=True)
print y7data
I also tried this but it only exported the entire first worksheet's data as opposed to the whole document (I am trying to do this so that I can understand how to export all data). I chose to do this thinking that maybe if I exported the data to a .csv
, then it might give me a clearer view of what went wrong, but I am nonethewiser:
import pandas as pd
import numpy as np
y7data = pd.read_excel('Documents\\y7_20161128.xlsx')
y7data.to_csv("results.csv")
I have tried a number of different things to try and specify which column within each worksheet, but cannot get this to work; it only seems to produce the results for the first worksheet.
How can I, firstly, read the data from column 14 in every worksheet, and then carry out the rest of the steps?
Any guidance would be much appreciated.
UPDATE (for those using Enthought Canopy and struggling with openpyxl
):
I am using Enthought Canopy IDE and was constantly receiving an error message around openpyxl
not being installed no matter what I tried. For those of you having the same problem, save yourself lots of time and read this post. In short, register for an Enthought Canopy account (it's free), then run this code via the Canopy Command Prompt:
enpkg openpyxl 1.8.5