3

I have some web site with JSON on it. JSON is in utf8 codepage ( as in RFC)

Web server answer:Content-Type:application/json; charset=utf-8

I need to convert it and send to scom agent. All is ok except that powershell destroys any Cyrillic symbol to '?'

$api = New-Object -comObject 'MOM.ScriptAPI'
$discoveryData = $api.CreateDiscoveryData(0, $sourceId, $managedEntityId)


$browser = New-Object System.Net.WebClient
$browser.Proxy.Credentials =[System.Net.CredentialCache]::DefaultNetworkCredentials 

$rowdata = Invoke-WebRequest 'https://monitoring.net/monitoring.json' -UseBasicParsing

$jsondata = ConvertFrom-Json $rowdata


foreach ($urls in $jsondata.monitors) 
{
    $instance = $discoveryData.CreateClassInstance("$MPElement[Name='58MCLibrary!F.058MC.Json.Url.Class']$")

    $instance.AddProperty("$MPElement[Name='058MCLibrary!058MC.Json.Url.Class']/Name$", $urls.name)               
    $instance.AddProperty("$MPElement[Name='58MCLibrary!058MC.Json.Url.Class']/Resource$", $urls.resource) 
    $instance.AddProperty("$MPElement[Name='058MCLibrary!058MC.Json.Url.Class']/Description$", $urls.description) 


    $discoveryData.AddInstance($instance)


}

$discoveryData

Story is that $urls.name and $urls.resource are ok - no cyrrylic in it

But $urls.description looks like Description ??????????????, ?????????????????? ?????????? ?????????????? ??????????. ????????????????????

Any way to fix it ? I tryed to set default env codepage with .NET to utf8 - no changes...

Its very strange to see codepage problems in PS...

Igor Kuznetsov
  • 421
  • 1
  • 6
  • 15
  • 2
    If you just do `$urls.description | Add-Content test.txt -Encoding UTF8` does it ruin the test file? – TessellatingHeckler Nov 01 '16 at 15:01
  • 1
    Yes, all the same – Igor Kuznetsov Nov 01 '16 at 15:23
  • "I tryed to set default env codepage with .NET to utf8 - no changes..." What commands did you use for that? Could you provide a short listing, please? – Jaiden Snow Nov 01 '16 at 17:05
  • Sidenote: Powershell is creature of beauty.. unfortunately sometimes a a creature of flawed beauty. It can be very harsh and difficult to work with if you try to work with strings encoded in anything but UCS-2 LE (LE = Little Endian). A sidenote - In microsoft-speak UCS-2 LE is called "Unicode" and UCS-2 BE is called "BigEndianUnicode". That UCS-2 is very close to UTF-16. This non-standard use of "Unicode" is due to historical reasons - .NET adopted that encoding before Unicode matured. Microsoft's M-"Unicode" is older than UTF-8. – Jaiden Snow Nov 01 '16 at 17:09
  • To solve your problem you must identfy places where UTF-8 is improperly interpreted as "Unicode" and put proper converters there. I expect some common points of craziness are: a) when you try redirect data to file, b) when you see "?" that character can be treacherous - perhaps it really is "?" but perhaps it's just console trying to display to you perfect and pipe-usable value but console has wrong encoding. Perhaps those would need some love. – Jaiden Snow Nov 01 '16 at 17:20
  • Hm.. I tryed to make simply file with powershell 'echo hello' file.txt and notepad++ says its 'ecs-2 le bom' O_o . For example - chcp 65001 not working. And [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 too – Igor Kuznetsov Nov 01 '16 at 17:39
  • 1
    Sample file - http://rgho.st/7dGqs2MK8 ..it is 100% on UTF8 – Igor Kuznetsov Nov 01 '16 at 17:41
  • Don't worry - "ucs-2 le bom" is not that bad result. Generally it's nice. UCS-2 LE BOM = the same UCS-2 LE, only with 2 extra bytes 0xFF 0xFE stacked to the start of file/string. Nothing more scary than that. Ok, I'll try to think which tests could be useful here. You print your "echo hello > file.txt" as UCS-2 LE because "> file.txt" is hard-wired sugar for "| Out-File file.txt -Encoding Unicode". If you want save file with any encoding with corresponding BOM - use "echo hello | Out-File file.txt -Encoding uft8" (this will create UTF-8 BOM file) – Jaiden Snow Nov 01 '16 at 17:58
  • If BOM is harmful for you - you need http://stackoverflow.com/questions/5596982/using-powershell-to-write-a-file-in-utf-8-without-the-bom Ok that's all irrelevant now. That's for output to file only. Don't bother yet. To the point - I'll try to think about proper tests for your case. Need some time to think though. – Jaiden Snow Nov 01 '16 at 17:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/127121/discussion-between-jaiden-snow-and-igor-kuznetsov). – Jaiden Snow Nov 01 '16 at 18:06

1 Answers1

1

$rowdata = Invoke-WebRequest '123/monitoring.json' -UseBasicParsing

$utf8_ready_data = [system.Text.Encoding]::UTF8.GetString($rowdata.RawContentStream.ToArray());

Looks like Invoke-WebRequest hates utf8 in 2016 year..

Igor Kuznetsov
  • 421
  • 1
  • 6
  • 15
  • Yep that's a bummer that Msoft chose that branch of development. If they'd make a step around 1995-2000 to be UTF-8 centered... we would have a nice, awesome environment to work with. Alas. That "Unicode" can be bothersome. But it could be worse. At least they are consistent with it. – Jaiden Snow Nov 02 '16 at 17:57