0

I am trying to convert over 600 files of .xml type into .xlsx type using this python script I found while searching around:

import sys

folder = '/FolderWithXMLFiles'
for filename in os.listdir(folder):
    infilename = os.path.join(folder, filename)
    if not os.path.isfile(infilename): continue
    oldbase = os.path.splitext(filename)
    newname = infilename.replace('.xml', '.xlsx')
    os.rename(infilename, newname)

Although the code runs fine, the output is not what I expected as I keep getting an error in Excel saying "File format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file."

I have tried converting the .xml file into .xls too and it works, but I get an error message saying that the file type and file format are not the same, but it still allows me to open it. Converting .xls to .xlsx did not work either.

Also, I have a second batch of files, which are the same as the .xml ones, but those ones are saved as Strict Open XML files. I have the choice of converting either of the batches to plain .xlsx file type but neither of the batches are converting to how I want them.

I have thought of copying all of the data from the .xml files into a new plain .xlsx workbook for each .xml, however, I haven't been able to figure out how to write a python script for that.

Any help would be appreciated.

rknkhn
  • 21
  • 2

1 Answers1

0

If you are running on PC, you could install win32com library:

pip install pywin32

Then, having two folders in your main script, one containing all your XML files, and an empty folder that will contain your XLSX files:

project/
   xml/
      1.xml
      2.xml
      ...
   xlsx/
   main.py

You could iterate through each XML files and save it as XLSX:

import win32com.client as win32
import os

excel = win32.Dispatch("Excel.Application")
excel.DisplayAlerts = False

path_root_project = os.path.dirname(os.path.abspath(__file__))

# Iterate through XML files
for xml_filename in os.listdir(path_root_project + './xml'):

    # Define XLSX Path based on XML filename
    xlsx_path_file = path_root_project + "\\xlsx\\" + xml_filename.replace('.xml', '.xlsx')
    xml_path_file = path_root_project + "\\xml\\"  + xml_filename
    
    # Open XML and save into XLSX
    wb = excel.Workbooks.OpenXML(xml_path_file)
    wb.SaveAs(xlsx_path_file)
    wb.Close()

This is my XML:

<?xml version="1.0" encoding="UTF-8"?>  
<emails>  
    <email>  
        <to>Vimal</to>  
        <from>Sonoo</from>  
        <heading>Hello</heading>  
        <body>Hello brother, how are you!</body>  
    </email>  
    <email>  
        <to>Peter</to>  
        <from>Jack</from>  
        <heading>Birth day wish</heading>  
        <body>Happy birth day Tom!</body>  
    </email>  
    <email>  
        <to>James</to>  
        <from>Jaclin</from>  
        <heading>Morning walk</heading>  
        <body>Please start morning walk to stay fit!</body>  
    </email> 
</emails>

And this is what I get in the XLSX:

enter image description here

Alex Montano
  • 343
  • 1
  • 4
  • 11
  • 1
    I was hoping this would work but I am still getting an error in Excel saying that the file type and file format do not match. – rknkhn Oct 17 '21 at 18:29
  • Can I take a look of your project in Github? Or add your code (Python + XML examples) in your question? – Alex Montano Oct 17 '21 at 19:29