2

I am trying to get information from the Spotify database through their Web API. However, I'm facing issues with accented vowels (ä,ö,ü etc.)

Lets take Tiësto as an example. Spotify's API Browser can display the information correctly: https://developer.spotify.com/web-api/console/get-artist/?id=2o5jDhtHVPhrJdv3cEQ99Z

If I make a API call with Invoke-Webrequest I get

Ti??sto

as name:

function Get-Artist {
param($ArtistID = '2o5jDhtHVPhrJdv3cEQ99Z',
      $AccessToken = 'MyAccessToken')


$URI = "https://api.spotify.com/v1/artists/{0}" -f $ArtistID

$JSON = Invoke-WebRequest -Uri $URI -Headers @{"Authorization"= ('Bearer  ' + $AccessToken)} 
$JSON = $JSON | ConvertFrom-Json
return $JSON
}

enter image description here

How can I get the correct name?

mklement0
  • 382,024
  • 64
  • 607
  • 775
Solaflex
  • 1,382
  • 1
  • 13
  • 24
  • 2
    The problem is that Spotify is (unwisely) not returning the encoding it's using in its headers. PowerShell obeys the standard by assuming ISO-8859-1, but unfortunately the site is using UTF-8. (PowerShell ought to ignore standards here and assume UTF-8, but that's just like, my opinion, man.) More details [here](https://github.com/PowerShell/PowerShell/issues/3126), along with the follow-up ticket. Possible workarounds [here](https://stackoverflow.com/questions/17705968/encoding-of-the-response-of-the-invoke-webrequest) (but unfortunately they involve abandoning `Invoke-WebRequest`). – Jeroen Mostert Dec 23 '17 at 13:09
  • Thanks Jeroen. I was already expecting this. However, I was assuming that Invoke-WebRequest would have a parameter for this. I will try out your workaround. will report back soon. – Solaflex Dec 23 '17 at 13:41

3 Answers3

8

Update: PowerShell (Core) 7.0+ now defaults to UTF-8 for JSON, and in 7.4+ to UTF-8 in general in the absence of a (valid) charset attribute in the HTTP response header, so the problem no longer arises there.


Jeroen Mostert, in a comment on the question, explains the problem well:

The problem is that Spotify is (unwisely) not returning the encoding it's using in its headers. PowerShell obeys the [now obsolete] standard by assuming ISO-8859-1, but unfortunately the site is using UTF-8. (PowerShell ought to ignore standards here and assume UTF-8, but that's just like, my opinion, man.) More details here, along with the follow-up ticket.

A workaround that doesn't require the use of temporary files:

Manually decode the raw byte stream of the response as UTF-8:

$JSON = 
  [Text.Encoding]::UTF8.GetString(
    (Invoke-WebRequest -Uri $URI ...).RawContentStream.ToArray()
  )

Alternatively, use convenience function ConvertTo-BodyWithEncoding; assuming it has been defined (see below), you can more simply use the following:

# ConvertTo-BodyWithEncoding defaults to UTF-8.
$JSON = Invoke-WebRequest -Uri $URI ... | ConvertTo-BodyWithEncoding

Convenience function ConvertTo-BodyWithEncoding:

Note:

  • The function manually decodes the raw bytes that make up the given response's body, as UTF-8 by default, or with the given encoding, which can be specified as a [System.Text.Encoding] instance, a code-page number (e.g. 1251), or an encoding name (e.g., 'utf-16le').

  • The function is also available as an MIT-licensed Gist, and only the latter will be maintained going forward. Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can define it directly as follows (instructions for how to make the function available in future sessions or to convert it to a script will be displayed):

    irm https://gist.github.com/mklement0/209a9506b8ba32246f95d1cc238d564d/raw/ConvertTo-BodyWithEncoding.ps1 | iex
    
function ConvertTo-BodyWithEncoding {

  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    [Microsoft.PowerShell.Commands.WebResponseObject] $InputObject,
    # The encoding to use; defaults to UTF-8
    [Parameter(Position=0)]
    $Encoding = [System.Text.Encoding]::Utf8
  )

  begin {
    if ($Encoding -isnot [System.Text.Encoding]) {
      try {
        $Encoding = [System.Text.Encoding]::GetEncoding($Encoding)
      }
      catch { 
        throw
      }
    }
  }

  process {
    $Encoding.GetString(
       $InputObject.RawContentStream.ToArray()
    )
  }

}
mklement0
  • 382,024
  • 64
  • 607
  • 775
2

Issue solved with the workaround provided by Jeron Mostert. You have to save it in a file and explicit tell Powershell which Encoding it should use. This workaround works for me because my program can take whatever time it needs (regarding read/write IO)

function Invoke-SpotifyAPICall {
param($URI,
      $Header = $null,
      $Body = $null
      )

if($Header -eq $null) {
    Invoke-WebRequest -Uri $URI -Body $Body -OutFile ".\SpotifyAPICallResult.txt"    
} elseif($Body -eq $null) {
    Invoke-WebRequest -Uri $URI -Headers $Header -OutFile ".\SpotifyAPICallResult.txt"
}

$JSON = Get-Content ".\SpotifyAPICallResult.txt" -Encoding UTF8 -Raw | ConvertFrom-JSON
Remove-Item ".\SpotifyAPICallResult.txt" -Force
return $JSON

}

function Get-Artist {
    param($ArtistID = '2o5jDhtHVPhrJdv3cEQ99Z',
          $AccessToken = 'MyAccessToken')


    $URI = "https://api.spotify.com/v1/artists/{0}" -f $ArtistID

    return (Invoke-SpotifyAPICall -URI $URI -Header @{"Authorization"= ('Bearer  ' + $AccessToken)})
}


Get-Artist
Solaflex
  • 1,382
  • 1
  • 13
  • 24
0

Have you tried something like

$output = [System.Text.Encoding]::UTF8.GetString([System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($JSON.Name))

I use this line that I found somewhere to convert API return text to UTF-8. I'm not quite sure why this is needed since JSON is supposed to be UTF-8 I believe.