0

It's my first time writing here, so I hope I'm doing everything all right. I'm using python 3.5 on Win10, and I'm trying to "sync" music from Itunes to my Android device. Basically, I'm reading the Itunes Library XML file and getting all the files location ( so I can copy/paste them into my phone ) but I have problems with songs containing foreign characters.

import getpass
import re
import os
from urllib.parse import unquote

user = getpass.getuser()
ITUNES_LIB_PATH = "C:\\Users\\%s\\Music\\Itunes\\iTunes Music Library.xml" % user
ITUNES_SONGS_FILE = "ya.txt"


def write(file, what, newline=True):
    with open(file, 'a', encoding="utf8") as f:
        if  not os.path.isfile(what):
            print("Issue locating file %s\n" % what)
        if newline:
            what+"\n"
        f.write(what)


def get_songs(file=ITUNES_LIB_PATH):
    with open(file, 'r', encoding="utf8") as f:
        f = f.read()
        songs_location = re.findall("<key>Location</key><string>file://localhost/(.*?)</string>", f)
        for song in songs_location:
            song = unquote(song.replace("/", '\\'))
            write(ITUNES_SONGS_FILE, song)


get_songs()

Output:

Issue locating file C:\Users\Dymy\Desktop\Media\Norin &#38;amp; Rad - Bird Is The Word.mp3

How should I handle that "&amp;" in the file name?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
mrclx
  • 304
  • 2
  • 11
  • Maybe you could use `replace()` again to make all `&`s into `&` – Will Nov 10 '15 at 19:23
  • 2
    or use the html lib like the answer [on here](http://stackoverflow.com/questions/2360598/how-do-i-unescape-html-entities-in-a-string-in-python-3-1) – R Nar Nov 10 '15 at 19:24
  • 1
    thanks @RNar, it solved the issue! I was wondering if there could be a way to avoid the utf8 encoding avoiding read() and write()... – mrclx Nov 10 '15 at 19:31
  • `&amp;` is an ampersand escaped twice. That has nothing to do with any character encoding. – roeland Nov 10 '15 at 22:08
  • I had to overcome the UnicodeError by setting "utf8" as encoding which changes the strings and causes file paths to be wrong, I was wondering if there was anyway to work with those paths without encoding to utf8. @roeland – mrclx Nov 10 '15 at 22:29
  • That doesn't make sense. “setting the encoding to utf-8” doesn't “change strings”. Anyway a valid XML file specifies the correct encoding on the first line of the file, eg. ``. – roeland Nov 10 '15 at 23:36
  • Use an XML parser to read in the file. It will handle the XML escaping and decode the file correctly as well. You'll still need to declare an encoding to write the file. – Mark Tolonen Nov 10 '15 at 23:47
  • If you provide a small sample of the XML you are parsing, I'm sure better answers can be provided. – Mark Tolonen Nov 11 '15 at 05:50
  • Ok, sorry for the delay guys.. I've been busy. Anyways, I checked the XML file and the encoding is utf8, I didn't know that in XML files some characters are rewritten in different ways such as &, quotes and so on. I'm still new to programming... So basically all I had to do was unescape the strings to get the locations of the files.. – mrclx Nov 11 '15 at 21:18
  • unrelated: to avoid escaping backslashes, you could use raw-string literals for Windows paths e.g., `r'C:\Users\Dymy\...'` – jfs Nov 13 '15 at 21:33

1 Answers1

0

There are a couple of related issues in your code e.g., unescaped xml character references, hardcoded character encodings cause by using regular expressions to parse xml. To fix them, use xml parser such as xml.etree.ElementTree or use a more specific pyitunes library (I haven't tried it).

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670