0

I'm parsing xml file on jupyter notebook and I use this code to open a file:

from lxml import etree as ET
tree = ET.parse('C:\Users\mysky\Documents\Decoded\F804187.xml')
root = tree.getroot()

And after that I do some processing with xpath and pandas, for example I do:

CODE = [ ]
for errors in root.findall('.//Book/Message/Param/Buffer/Data/Field[11]'):
    error_code = errors.find('RawValue').text
    if error_code is not None:
        CODE.append(error_code)  

I have about 10 small code blocks like that for extracting my data and at the end I save my dataframe in a CSV file.

I have a lot of xml file and i want to read all files of my Decoded directory one by one and then process them one by one too and append each result in my CSV file.

Thanks!

M-M
  • 440
  • 2
  • 16

1 Answers1

1

To list all xml files in your directory you can use for example glob (second answer).

It can look like this:

import glob

files = glob.glob('C:\Users\mysky\Documents\Decoded\*.xml')

    for file in files:
        tree = ET.parse(file)
        root = tree.getroot()
        CODE = [ ]
        for errors in root.findall('.//Book/Message/Param/Buffer/Data/Field[11]'):
            error_code = errors.find('RawValue').text
            if error_code is not None:
                CODE.append(error_code)  
Qback
  • 4,310
  • 3
  • 25
  • 38
  • Thanks @Qback, it works. But when I run other codes to process process my xml data, it takes just first file. So how can I do it by a loop which process file one by one. For example if i have 5 code blocks like that: `CODE = [ ] for errors in root.findall('.//Book/Message/Param/Buffer/Data/Field[11]'): error_code = errors.find('RawValue').text if error_code is not None: CODE.append(error_code)` I want for each file do 1, 2, 3, 4 and 5 code and then restart for next file like that up to end of my file list. – M-M Apr 25 '18 at 09:50
  • Why not to define a function to combine these 5 code blocks? – Lambda Apr 25 '18 at 10:10