To obtain nice column names instead of defaults like 'Unnamed: 1'
use the names
parameter of pd.read_excel
. Mutatis mutandis, try replacing
with pd.ExcelFile(inputFile,
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']) as xlsx:
df1 = pd.read_excel(xlsx, 'pnl1 Data ',skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])#assign column headers
df2 = pd.read_excel(xlsx, 'pnl2 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df3 = pd.read_excel(xlsx, 'pnl3 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
df4 = pd.read_excel(xlsx, 'pnl4 Data', skiprows=9, parse_cols="B:H", keep_default_na='FALSE', na_values=['NULL'])
with
sheets = ['pnl1 Data','pnl2 Data','pnl3 Data','pnl4 Data']
df = pd.read_excel(inputFile, sheetname=sheets, skiprows=9, parse_cols="B:H",
names=list('BCDEFG'))
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
This will make df
a dict, whose keys are sheet numbers and whose values are
DataFrames. The DataFrames will have colum names B
through G
, roughly like
the original Excel file.
Thus, instead of referring to numbered variables df1
, ..., df4
(generally, a bad idea), you'll have all the DataFrames in the dict df
and will be able to access them by numeric indexing: df[1]
, ..., df[4]
. Sheet pnl3 Data
, for example, would be accessed as df[3]
.
To access the seventh row, B
column value of sheet 'pnl1 Data'
of you could then use:
g_int_c = str(df[1].loc[6, 'B'])
For example,
import pandas as pd
try: from cStringIO import StringIO # for Python2
except ImportError: from io import StringIO # for Python3
import textwrap
df1 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,1,2,3
1,4,5,6
7,8,9,10""")))
df2 = pd.read_csv(StringIO(textwrap.dedent("""
,,,
0,NULL,2,3
1,4,NULL,NULL""")), converters={i:str for i in range(4)})
sheets = ['pnl1 Data','pnl2 Data']
writer = pd.ExcelWriter('/tmp/output.xlsx')
for df, sheet in zip([df1, df2], sheets):
print(df)
# Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
# 0 0 NULL 2 3
# 1 1 4 NULL NULL
df.to_excel(writer, sheet)
writer.save()
df = pd.read_excel('/tmp/output.xlsx', sheetname=sheets, names=list('ABCD'), parse_cols="A:E")
df = {i: df[sheet] for i, sheet in enumerate(sheets, 1)}
for key, dfi in df.items():
print(dfi)
# A B C D
# 0 0 1 2 3
# 1 1 4 5 6
# 2 7 8 9 10
# A B C D
# 0 0 NaN 2.0 3.0
# 1 1 4.0 NaN NaN
print(df[1].loc[1, 'B'])
# 4