How to select multiple columns (but same rows) of xlsx file while looping using Openpyxl?

Question

I have an excel file that looks like this (example) [Balance Sheet][1] [1]: https://i.stack.imgur.com/O0WXP.jpg I would like to extract all the items of this financial statement and write it to a new excel sheet. The output that I want is that all accounts under one column, and all the corresponding numbers in another column [Intended output][2] [2]: https://i.stack.imgur.com/nbTtR.jpg

My code so far is:

import openpyxl
fwb=openpyxl.load_workbook('wb.xlsx')
sheet=fwb['Sheet1']
sheet['A9']

for i in range(9,sheet.max_row,1):
    items=sheet.cell(row=i, column=1).value
    number1=sheet.cell(row=i, column=3).value
    number2=sheet.cell(row=i, column=4).value
    print(items, number1, number2)

My issue is I want the list of items to be under one column, just like the intended output. Hence I would ideally want items=sheet.chell(row=i, column=1 AND 2).

score 2 · Answer 1 · answered Feb 19 '18 at 09:26

In openpyxl this is very straightforward:

ws1 is your source worksheet ws2 is your target worksheet

for row in ws1['A':'B']:
    ws2.append((c.value for c in row))

for row in ws1['C':'D']:
    ws2.append((c.value for c in row))

Adjust the columns as you need them

score 0 · Answer 2 · answered Feb 17 '18 at 08:27

I will guess the structure of your worksheet from the code, since you did not specify which ranges contain which data.

Something like this may work for you. You probably need to adjust some values with +/-1, depending on headers, etc.

row_base1=len(sheet['A'])
nrows2=len(sheet['C'])-9
for i in range(1,nrows2):
    row1=row_base1+i
    row2=8+i
    number1=sheet.cell(row=row2, column=3).value
    number2=sheet.cell(row=row2, column=4).value
    sheet.cell(row=row1, column=1).value=number1
    sheet.cell(row=row1, column=2).value=number2
    print(items, number1, number2)

nrows2 might give a number larger then what you actually need, see this. In that case, you will have to add some detection method inside the loop.

ccrsxx · Answer 3 · 2021-10-05T00:45:03.137

0

Here's my approach using lambda.

Index using numbers

column = lambda x: sheet[chr(ord('@') + x) + str(i)].value

for i in range(1, sheet.max_row + 1):
    print(column(1), column(3), column(4))

Index using alphabets

column = lambda x: sheet[x + str(i)].value

for i in range(1, sheet.max_row + 1):
    print(column('A'), column('C'), column('D'))

edited Oct 05 '21 at 00:45

answered Oct 03 '21 at 10:31

ccrsxx

410
5
5

alingo · Answer 4 · 2018-02-17T15:52:51.863

-1

You might try to use pandas as the following. The result can be saved to excel file, if you want. Run #pip install xlrd first.

import pandas as pd
book1 = pd.ExcelFile('book1.xlsx')
df = pd.read_excel(book1, 'Sheet1')
cols = ['Item', 'Value']
x = df.drop(df.columns[2:], axis=1)
y = df.drop(df.columns[:2], axis=1)
x.columns = cols
y.columns = cols
df2 = pd.concat([x, y], ignore_index=True)
df2.dropna(how='all', inplace=True)
print(df2)

Result1

Also can do this

df2['Index'] = df2.loc[df2['Value'].isnull(), 'Item']
df2.Index.fillna(method='ffill', inplace=True)
df3 = df2.set_index(['Index', 'Item']).dropna()
print(df3)

Result2

edited Feb 17 '18 at 15:52

answered Feb 17 '18 at 08:42

alingo

16
2

Hi, thanks so much for this! I find your first code easiest to understand. However I'm getting an error for the y.columns=cols. Is it because there's a mistake in the code? – shenxiaoya Feb 18 '18 at 15:07
i guess your y df is more than 2 columns. print and see if u need to change the drop column index – alingo Feb 18 '18 at 16:36

How to select multiple columns (but same rows) of xlsx file while looping using Openpyxl?

4 Answers4