2

Is there anyway to get the youtube-dl.extract_info() function to use unicode when creating the output file?

I have encountered the problem that if you download something with unicode characters like | in the title then the output file name will not have the same character. It will be replaced with _ instead.

Take this song title for example. If I download it with youtube-dl then I get this file name 【Nightcore】→ Pretty Girl _ Lyrics-dMAOnScOyGE. Same thing happens with different kind of characters.

Is there any way to stop this? Because it's a annoying if you want do do anything with that file afterwards.

To get the new file name I would need to do something like os.listdir(dir) to get the file. So it's not impossible to get the new file name, but I am just interested if there is a easier way.

stego
  • 193
  • 2
  • 11
  • Won't this answer help? https://stackoverflow.com/questions/40713268/download-youtube-video-using-python – Dmytro Chasovskyi Dec 20 '18 at 15:12
  • would if I wanted to change the download path, but I also need to rename the file, so I don't think that would fully solve the problem. But I appreciate the quick help :) – stego Dec 20 '18 at 15:23
  • 1
    It’s nothing to do with “Unicode”, `|` is an ASCII character (as well as a Unicode character, like any other character). YTDL just ensures that paths won’t cause issues; a pipe is a special character in many contexts. You generally want a *predictable path*, or figure out exactly what file it created. – deceze Dec 20 '18 at 16:07
  • so there is no way to turn it off? – stego Dec 20 '18 at 17:00

1 Answers1

3

The encoding of | to _ is hardcoded in sanitize_filename in youtube_dl/utils.py. You can turn it off programatically by substituting youtube_dl.utils.sanitize_filename with your own implementation.

However, doing so is not recommended, and not supported out of the box. This is because | is an invalid character on Windows and can be used to execute arbitrary commands if expanded in a buggy script.

Insecure filenames were supported at one time, but I removed them from youtube-dl because too many people were shooting themselves in the foot, and often reported problems that clearly would have let any attacker execute arbitrary code on their machines.