0

I am using Openpyxl to read the excel file and get my desired output in txt file (not all the code shown below as it is irrelevant). Below is my code for reading the excel file.The test file contains 3 sheets.As you might have noticed, I am skipping 1st sheet in my excel file.The other sheets has the data that I need. The Columns that I am interested in are "Field Name" and "Type". However, as shown in below snippets, the rows are located in row 5 in sheet 1 and row 8 in sheet 2. I was wondering if I can get both sheets to have "Field Name" and "Type" to start from 7 (instead of doing manually) ? Is there any search that I can perform to make sure that I have "Field Name" and "Type" on row 7, if not can I have it corrected in the same sheet instead of creating a copy the sheet ? I checked here, unfortunately couldn't find the solution. The reason to start from row 7 is because I am taking the data from row8 onwards form the sheet and adding it to txt file.

Note: Below snapshots are demo only. My original excel file contains 10+ sheets with same issue i.e. "Field Name" and "Type" not starting from row 7

Thanks in advance for your help!

Python code:

from openpyxl import load_workbook

data_file='test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
 
skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        for i in range(7, ws.max_row+1):  
            name = ws.cell(row=i, column=1).value
            print(i, name)
            name1=ws.cell(row=i, column=2).value
            print(name1)
            ....... my other code

Sheet 1

Sheet 1:

Sheet 2:

enter image description here

Sheet output after SO comments:

Sheet 1:

enter image description here

Sheet 2:

enter image description here

biggboss2019
  • 220
  • 3
  • 8
  • 30

1 Answers1

1

You can achieve this by using insert_rows() and delete_rows()...

Note that you need to save the file once you have added/deleted the rows.

from openpyxl import load_workbook

data_file='test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
 
skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        CurrentRow = 0
        for row in ws.iter_rows(max_col=2):
            if row[0].value == 'Field Name' and row[1].value == 'Type':
                CurrentRow = row[0].row
                break            
            else:
                pass
        if CurrentRow > 7:
            ws.delete_rows(7, CurrentRow - 7)
        elif CurrentRow < 7 and CurrentRow > 0:
            ws.insert_rows(CurrentRow, 7 - CurrentRow)
wb.save('test.xlsx')

Dealing with tables It looks like the input data in your sheet is a excel Table. You can check this by selecting the range and right-clicking (should have table option under Quick Analysis). If this is the case, you have two options.

  1. Select a cell in table >> right click >> Table >> Convert to Range. Then the original code will run. Don't know if that works.
  2. Written below is the code that will work if all your sheets have tables. Note that I am considering that there is only one table in each sheet. Also, the style is set to the blue format you have shared in your pics above. Borrowed code from here
from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=False)

def colnum_string(n):
    string = ""
    while n > 0:
        n, remainder = divmod(n - 1, 26)
        string = chr(65 + remainder) + string
    return string

#data_file='test.xlsx'
data_file = input("Please provide the name of file you want to process: ")

# Load the entire workbook.
wb = load_workbook(data_file)

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        CurrentRow = 0
        tablelen = 0
        for row in ws.iter_rows(max_col=2):
            if row[0].value == 'Field Name' and row[1].value == 'Type':
                CurrentRow = row[0].row
                tablelen = ws.max_row - CurrentRow
                break            
            else:
                pass
        if CurrentRow > 7:
            ws.delete_rows(7, CurrentRow - 7)
            resTable = Table(displayName=ws.tables.items()[0][0], ref="A7:{}{}".format("B", 7+tablelen))
            resTable.tableStyleInfo = style
            ws._tables[ws.tables.items()[0][0]] = resTable

        elif CurrentRow < 7 and CurrentRow > 0:
            ws.insert_rows(CurrentRow, 7 - CurrentRow)
            resTable = Table(displayName=ws.tables.items()[0][0], ref="A7:{}{}".format("B", 7+tablelen))
            resTable.tableStyleInfo = style
            ws._tables[ws.tables.items()[0][0]] = resTable
           
#wb.save('test.xlsx')
wb.save(data_file.split('.')[0] + "_updated." + data_file.split('.')[1])

New Req - Read all xlsx files

from openpyxl import load_workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
style = TableStyleInfo(name="TableStyleMedium9", showFirstColumn=False, showLastColumn=False, showRowStripes=True, showColumnStripes=False)

def colnum_string(n):
    string = ""
    while n > 0:
        n, remainder = divmod(n - 1, 26)
        string = chr(65 + remainder) + string
    return string

import os
ALLOWED_EXTENSIONS = set(['xlsx'])

def allowed_file(filename):
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

MyPWD = os.getcwd()
for filename in os.listdir(MyPWD):
    path = os.path.join(MyPWD, filename)
    if os.path.isfile(path) and allowed_file(filename):

        #data_file='test1.xlsx'
        #data_file = input("Please provide the name of file you want to process: ")
        # Load the entire workbook.
        wb = load_workbook(filename)

        skip = True
        for ws in wb.worksheets:
            if skip == True:
                skip = False
            else:
                CurrentRow = 0
                tablelen = 0
                for row in ws.iter_rows(max_col=2):
                    if row[0].value == 'Field Name' and row[1].value == 'Type':
                        CurrentRow = row[0].row
                        tablelen = ws.max_row - CurrentRow
                        break            
                    else:
                        pass

                if CurrentRow > 7:
                    ws.delete_rows(7, CurrentRow - 7)
                    resTable = Table(displayName=ws.tables.items()[0][0], ref="A7:{}{}".format("B", 7+tablelen))
                    resTable.tableStyleInfo = style
                    ws._tables[ws.tables.items()[0][0]] = resTable
                elif CurrentRow < 7 and CurrentRow > 0:
                    ws.insert_rows(CurrentRow, 7 - CurrentRow)
                    resTable = Table(displayName=ws.tables.items()[0][0], ref="A7:{}{}".format("B", 7+tablelen))
                    resTable.tableStyleInfo = style
                    ws._tables[ws.tables.items()[0][0]] = resTable

        #wb.save('test2.xlsx')
        wb.save(filename.split('.')[0] + "_updated." + filename.split('.')[1])
Redox
  • 9,321
  • 5
  • 9
  • 26
  • What is the `skip` parameter for? – Charlie Clark May 25 '22 at 09:14
  • 1
    Hi @CharlieClark - that is part of the existing code - it is used for skipping the first sheet and probably used for something else later as this is only snippet of code. So, didn't touch it. – Redox May 25 '22 at 10:15
  • Fair enough. FWIW using iter_rows(max_col=2) allows you to unpack the row tuple and means fewer lookups. I'd also calculate the offset more explicitly. – Charlie Clark May 25 '22 at 12:36
  • 1
    Thanks.... Updated iter_rows. For offset, did not get that. Are you are talking about keeping an offset variable rather than have it hardcoded as 7? – Redox May 25 '22 at 15:20
  • @Redox..Thanks, it solve the issue. Now the issue is updated sheet look something like updated screenshots. Is there a way to improve that ? The tables are in Table format in Excel. – biggboss2019 May 25 '22 at 20:22
  • @CharlieClark..how would the offset work in the above code proposed by Redox ? – biggboss2019 May 25 '22 at 20:24
  • 1
    @biggboss2019 - Updated code added. Just be careful to use new code if all your sheets use tables and not just data range. Also, hopefully the style wont change – Redox May 26 '22 at 16:18
  • Thank you! Now the only issue is whnever I try to open WB in python for further processing I am receiving an error that says "ValueError: Table with name Data already exists". Any way to solve this ? I tried opening and resaving the file but didn't help – biggboss2019 May 26 '22 at 21:09
  • 1
    That is strange, as I am not facing the same issue. Anyway, based on error msg, I made a small tweak - changing the displayname to tablename in code above. See if that works. Else, let me know how to recreate the issue – Redox May 27 '22 at 07:28
  • @Redox..for some reason I had to create new file, and If I use the above code it works. Thanks! Question, While saving the file Is it possible to save the file with same name ? if it is not possible then can I save it something link "myfilename_edited_file" ? If yes, how to achieve it ? TIA – biggboss2019 May 28 '22 at 01:48
  • 1
    Yes, you can save it as it with the same file name. Need to change the filename in last line `wb.save()` to the same name as the file you are opening. In above code v2, I updated it so that the file opened (test.xlsx) is the same as the file closed (now test.xlsx) – Redox May 28 '22 at 04:29
  • This works pretty well. Last query, If I have multiple xlsx file, whose columns I need in specific position then how would I achieve that ? I think saving the file with different name will make more sense, then the new file can have name as "test_updated".Sorry for confusion. But let's say the second file in my directory is "Employee.xlsx", then using the above code I can get the updated file as "Employee_updated.xlsx". I would like to automate the above process for multiple xlsx file. i.e. opening multiple excel files and saving along with their name as file_updated. Is it possible ? – biggboss2019 May 28 '22 at 18:02
  • Updated code above. The input is now provided by user instead of hard-coded 'test.xlsx'. Remember user has to give full excel file name including extension. The output is read into the same name with `_updated` inserted - like `test.xlsx` would be read and updated workbook created with name `test_updated.xlsx` – Redox May 29 '22 at 04:39
  • Sorry..So I meant instead of asking for user to input file, Is there a way I can read all the excel files that are in current working directory and have it converted to "_updated.xlsx" ? – biggboss2019 May 29 '22 at 14:40
  • 1
    Updated... note that `ALLOWED_EXTENSIONS` can be modified if you have other types of files you want to read.... like .xls or .xlsm, etc. Right now, only .xlsx files will be read. ALL xlsx files will be read, updated if table exists and written to file_updated.xlsx – Redox May 29 '22 at 16:26