2

I have 50 folders that contain each one an xml file. The problem is that the formatting is supposed to be like:

<data>
    <items>
        <item name="item1_πα"></item>
        <item name="item2_πα"></item>
        <item name="item3_πα"></item>
        <item name="item4_πα"></item>
    </items>
</data>

but is

b'<data>\n  <items>51041<item name="item1_\xcf\x80\xce\xb1\xcf\x81\xce\xb1\xce\xb3\xcf\x89\xce\xb3\xce\xae"/>\n    <item name="item2"/>\n    <item name="item3"/>\n    <item name="item4"/>\n  </items>\n</data>\n\n'

can I modify them all with a loop and make them appear as they should?

something like this:

for i in os.listdir(r"C:\Users\user\Desktop\testin"): # <- here are the 50 folders
    with open('bac.xml', 'r'): # open each xml
        with open('bac.xml','w'): # write each xml formatted now
            example.writexml(file, indent='\n', addindent=' ',encoding = 'utf-8')

Note:All the xml files in each folder have the same name.

  • If you do `print(name)` you will see that the format of the string inside `name` is a good match for the example you present. For example, the linebreaks and indentation are what you expect, and you will see the first item name shown as ``. You have your data in a bytestring. If you just display a bytestring to the terminal, Python will show `\x` encodings for non-ascii characters. – BoarGules Aug 07 '18 at 08:28
  • so how to make it appear as you said and not like`xcf\x80\xce\xb etc` – user10190263 Aug 07 '18 at 08:31
  • This is an answer you've written that I can't make it work to my code: https://stackoverflow.com/a/44005629/9988562 What do you suggest to make it appear formatted when exported? – user10190263 Aug 07 '18 at 09:10

1 Answers1

0

The problem you have is probably with the decoding of the bytes: This thread has a solutuion. Basically you need to read the file as a byte (hence the 'rb', b for bytes) and then decode() it:

import os    

# this will get you all the subdirectories 
name = r"C:\Users\user\Desktop\testin"
folders_list = [os.path.join(name, directory) for directory in os.listdir(name) if os.path.isdir(os.path.join(name,directory))]

for folder in folders_list:
    #when you read the file:
    with open(folder+r'\bac.xml', 'rb') as f:
        your_file = f.read().decode()
    # if you need to write it anywhere else:
    with open(folder+r'\bac.xml', 'wb') as f:
        f.write(your_file.encode())
Gozy4
  • 444
  • 6
  • 11
  • how to do it in for loop through the folders I mentioned – user10190263 Aug 07 '18 at 08:22
  • I don't understand one thing: are the data stored in the xml files formatted like `b'\n 51041\n \n \n \n \n\n\n'` or are they already properly formatted but you get the format above while reading them? – Gozy4 Aug 07 '18 at 08:27
  • the xml files are exported by my code and if I open the xml files with notepad they appear in one line. The fix I am searching is to make them formatted in a for loop. – user10190263 Aug 07 '18 at 08:33
  • I don't know if i get what you need. I hope this helps. – Gozy4 Aug 07 '18 at 08:53
  • And about the format fix ?this just reads and writes What do you suggest?Apparently it keeps the bad formatting. It needs to format it before writing. But how? – user10190263 Aug 07 '18 at 09:05
  • Please provide us with an example of how the file looks like before you read it because i think there is some confusion here. Thank you. – Gozy4 Aug 07 '18 at 09:42
  • OK I will explain. the xml files if I open them in notepad it looks in one line like the second grey area in the question. and it should look like the first grey area in my question. It says it clearly. The problem is with the file format itself not when you open it in Python but when you open the xml itself like in a notepad. Hope it is clear now. – user10190263 Aug 07 '18 at 09:56