1

I am relatively new to python, and have tried a couple different methods I have seen throughout this website, but nothing seems to work with what I need.

I have an excel file, with two columns of information, and multiple sheets. I would like to add a column to each of these sheets with the header "System". Under the header I need to have all the cells in sheet 1 fill with "System 1" and for sheet 2 "System 2". Each sheet has a different amount of entries, and all are over 1000.

At the end I want to merge all the sheets together making one master file, which I was able to figure out, but I can seem to add a column to the individual sheets. Below is the code I have so far:

import pandas as pd 
import openpyxl
import pprint 

fileName= input("Enter file name here (Case Sensitve) > ")

df = pd.read_excel(fileName +'.xlsx', sheetname=None, ignore_index=True)
xl = pd.ExcelFile(fileName +'.xlsx')

df1 = pd.read_excel(xl, 'System 1')
df2 = pd.read_excel(xl, 'System 2')
df3 = pd.read_excel(xl, 'System 3')
df4 = pd.read_excel(xl, 'System 4')
df5 = pd.read_excel(xl, 'System 5')

The excel document looks along the lines of this:

employee_number email
7000            test1@test.com
11552           test2@test.com
21356           test3@test.com
21365           test4@test.com
22022           test5@test.com
24280           test6@test.com
26279           test7@test.com

there are 5 sheets within one document that resemble this.

BeMuRR187
  • 77
  • 1
  • 12
  • First, your code, as written, doesn't run. Please see https://stackoverflow.com/help/mcve for advice. You need to fix the `load_workbook` line. If you're opening this Excel file, what's in it? Does it have sheets `System 1` and `System 2`? Can you upload a few rows from it, or a sample of it? – Evan Jan 24 '18 at 17:10
  • Hello Evan, I apologize, the code I had earlier that ran I erased because the end result wasn't what I needed. I added more information to the original post, in hopes of providing extra clarification. I think I will try placing each sheet into its own dataframe, and add a column to each dataframe titled "System". Thank you for taking the time to read this. – BeMuRR187 Jan 24 '18 at 18:11
  • I also added df1['System'] = "System 1" , but is there a more efficient way in doing this? – BeMuRR187 Jan 24 '18 at 18:24

1 Answers1

0

There are a few ways to solve this problem. In my experience, importing and exporting (particularly exporting) Excel format files can have issues.

import pandas as pd
import numpy as np

df = pd.read_excel('Sample - Superstore.xls')
df['a_new_col'] = np.random.rand()
df.to_excel('New_Excel_File.xlsx')

That will import the first worksheet, create a df, and export it to a new Excel file.

If all you want to do is create a new column, the syntax is simply:

df['new_col'] = some_data

To import all of the sheets of an Excel file, see Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

You may also want to try merging, joining, or concatenating the dataframes, then exporting them. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

Evan
  • 2,121
  • 14
  • 27