2

I was trying to print info of a video in Youtube using yt-dlp and want to do stuff with it in PowerShell

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live', 
'title': '【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】'}

but when i want to convert it using ConvertFrom-Json the Japanese characters and those【】gone.

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

Output:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live  Minecraft  iofi / hololive

Because when it's fine when i don't use ConvertFrom-Json i don't think it's yt-dlp problem. I also tried using some alternative way to do it and it doesn't seem to meet bright light.

Then I found out that do this (put previous command inside parenthesis)

(yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': '%(title)s'}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error)

Already throw Japanese characters and the【】

Output:

{'domain': 'youtube.com', 'uploader': '@AiraniIofifteen', 'status': 'was_live',
'title': ' Minecraft  iofi / hololive '}

I have searched many solution regarding this matter but nothing solved this problem.

Stuff like chcp, $OutputEncoding, [System.Console]::OutputEncoding, [System.Console]::InputEncoding, [System.Text.Encoding]::UTF8. Didn't solve my problem. Or maybe because i don't really understand these things so i can't solve my problem.

I almost certain this question is a duplicate question, but the thing is I am not familiar with this stuff and so i don't know what to search to solve this specific problem.

Few stuff I've tried:

$OutputEncoding = [System.Console]::OutputEncoding = [System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
$OutputEncoding = [Console]::OutputEncoding = [Console]::InputEncoding = (new-object System.Text.UTF8Encoding $false)

I also tried some other solution but i don't include it here because almost no way those solution will fix this problem

Alimul
  • 51
  • 5

2 Answers2

2

I found the solution.

In the yt-dlp side, I changed the title format to json and remove the single quotes.

'title': %(title)j

Command:

yt-dlp.exe --print "{'domain': '%(webpage_url_domain)s', 'uploader': '%(uploader_id)s', 'status': '%(live_status)s', 'title': %(title)j}" https://www.youtube.com/live/nxbSLFv2JAs --ignore-no-formats-error | ConvertFrom-Json

Output:

domain      uploader         status   title
------      --------         ------   -----
youtube.com @AiraniIofifteen was_live 【 Minecraft 】イオそらとえーちゃん仲良く大作戦【 iofi / hololive 】

But I think the question is still unanswered. Why if I pass %(title)s (string with Japanese characters and something like【】) to anything in PowerShell it doesn't handle this properly.

Alimul
  • 51
  • 5
1

This isn't a complete answer either, but sheds some more light on the situation:

It is yt-dlp.exe itself that is the problem, which is why configuring PowerShell to use UTF-8 doesn't help (calling from cmd.exe after running chcp 65001 there doesn't help either):

  • When printing output to the console, yt-dlp.exe does print non-ASCII-range Unicode characters such as

  • When yt-dlp.exe's output is redirected (such as to ConvertFrom-Json via a pipeline), seemingly all such characters are simply removed.

    • The docs describe Unicode-related options such as +, but I couldn't get them to work, at least in version 2023.03.04.

Because your --print argument happens to be JSON, using format type j provides an effective solution, because the resulting JSON representation of the title uses Unicode escape sequences such as \u3010 to represent non-ASCII-range Unicode characters, which means that the string is composed of ASCII-range characters only, which bypasses the original problem.

mklement0
  • 382,024
  • 64
  • 607
  • 775