1

I'm trying to successfully retrieve some JSON data from log4jscanner.exe (Qualys software to detect if you got a vulnerable file or component in your pc/server) but after spent many hours on it, i think i got an issue with Powershell.

If I store the result of the following command in Powershell 5.1 $a = .\Log4jScanner.exe /scan /report_pretty

The result is "displayed" like :

PS C:\temp> $a |where {$_ -ne ""}
    {
        "scanSummary": {
            "scanEngine": "2.0.2.7",
            "scanHostname": "XXXXXXXXX",
            "scanDate": "2022-01-20T18:02:26+0100",
            "scanDurationSeconds": 28,
            "scanErrorCount": 54,
            "scanStatus": "Partially Successful",
            "scannedFiles": 649020,
            "scannedDirectories": 209514,
            "scannedJARs": 31,
            "scannedWARs": 0,
            "scannedEARs": 0,
            "scannedPARs": 0,
            "scannedTARs": 5,
            "scannedCompressed": 43,
            "vulnerabilitiesFound": 1
        },
        "scanDetails": [
            {
                "file": "XXXXXX.jar",
                "manifestVendor": "Unknown",
                "manifestVersion": "Unknown",
                "detectedLog4j": true,
                "detectedLog4j1x": true,
                "detectedLog4j2x": false,
                "detectedJNDILookupClass": false,
                "detectedLog4jManifest": false,
                "log4jVendor": "log4j",
                "log4jVersion": "1.2.17",
                "cve20214104Mitigated": false,
                "cve202144228Mitigated": true,
                "cve202144832Mitigated": true,
                "cve202145046Mitigated": true,
                "cve202145105Mitigated": true,
                "cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )"
            }
        ]
    }

After that, i want to convert that data to work on a specifical value, first of all i try to convert data from json, here the text goes RED and the following error happened :

    PS C:\temp> $a | convertfrom-json
    convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): {
    
        "scanSummary": {
    
            "scanEngine": "2.0.2.7",
    
            "scanHostname": "FRBOURWXT013379.vcn.ds.volvo.net",
    
            "scanDate": "2022-01-20T18:02:26+0100",
 .... .... ....

Finally, if I copy/paste the content of $a into another variable like

$b = '
{
            "scanSummary": {
                "scanEngine": "2.0.2.7",
                "scanHostname": "XXXXXXXXX",
                "scanDate": "2022-01-20T18:02:26+0100",
                "scanDurationSeconds": 28,

... ... ... 
'

It means that i'm now able to access converted data :

PS C:\temp> $b | convertfrom-json

scanSummary
-----------
@{scanEngine=2.0.2.7; scanHostname=XXXXXXXXX; scanDate=2022-01-20T18:02:26+0100; scanDurationSeconds=28; scanErrorCount=54; scanStatus=Partially Successful; scann...

At the moment $a type is Object[] , $b type is String.

So i tried to convert $a to string

PS C:\temp> $a = [string] $a
PS C:\temp> $a
{      "scanSummary": {          "scanEngine": "2.0.2.7",          "scanHostname": "XXXXXXXXX",          "scanDate": "2022-01-20T18:02:26+0100",          "scanDurationSeconds": 28,          "scanErrorCount": 54,          "scanStatus": "Partially Successful",          "scannedFiles": 649020,          "scannedDirectories": 209514,          "scannedJARs": 31,          "scannedWARs": 0,          "scannedEARs": 0,          "scannedPARs": 0,          "scannedTARs": 5,          "scannedCompressed": 43,          "vulnerabilitiesFound": 1      },      "scanDetails": [          {              "file": "XXXXX.jar",              "manifestVendor": "Unknown",              "manifestVersion": "Unknown",              "detectedLog4j": true,              "detectedLog4j1x": true,              "detectedLog4j2x": false,              "detectedJNDILookupClass": false,              "detectedLog4jManifest": false,              "log4jVendor": "log4j",              "log4jVersion": "1.2.17",              "cve20214104Mitigated": false,              "cve202144228Mitigated": true,              "cve202144832Mitigated": true,              "cve202145046Mitigated": true,              "cve202145105Mitigated": true,              "cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )"          }      ]  }

and then convert it from json, but it's a total mess

PS C:\temp> $a | convertfrom-json
convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): {      "scanSummary": {          "scanEngine": "2.0.2.7",
       "scanHostname": "XXXXXX",          "scanDate":
"2022-01-20T18:02:26+0100",          "scanDurationSeconds": 28,          "scanErrorCount":
54,          "scanStatus": "Partially Successful",          "scannedFiles": 649020,
"scannedDirectories": 209514,          "scannedJARs": 31,          "scannedWARs": 0,

Finally, if i export any data to .json file, i can't open it with notepad or codium (every character = nul nul nul nul) whereas i can access it with get-content within powershell.

It seems there's some hidden characters or i don't know what, but i can't handle how to easily convert and access json data in my case.

Is there anything missing ?

Thanks a lot for your support guys !

EDIT 1 - if i save the output, i can't open the .json file correctly ,but Powershell seems to understand it well : enter image description here

js2010
  • 23,033
  • 6
  • 64
  • 66
motorbass
  • 168
  • 6
  • Does `$a |where {$_ -ne ""} |ConvertFrom-Json` work? – Mathias R. Jessen Jan 20 '22 at 17:28
  • Hi, unfortunately no, i got this error "ConvertFrom-Json : Objet non valide passé, ':' ou '}' attendu. (2): {" which means "object not valid ":" or "}" waited – motorbass Jan 21 '22 at 07:11
  • Sfc outputs utf16 as well (odd bytes are null), but there's no BOM in a pipeline. See https://stackoverflow.com/questions/57749808/sfc-output-redirection-formatting-issue-powershell-batch – js2010 Jan 21 '22 at 15:09

2 Answers2

3

Log4jScanner.exe outputs Unicode.

There is a bug in PowerShell that causes the output from programs that send Unicode bytes to their STDOUT/STDERR streams to be mangled.

It's easy to confirm - when you run the command

Log4jScanner.exe /scan_directory C:\something /report_pretty > output.json

in cmd.exe, then output.json will be neat UTF-16:

0d 00 0a 00 7b 00 0d 00 0a 00 20 00 20 00 20 00  .␀.␀{␀.␀.␀ ␀ ␀ ␀
20 00 22 00 73 00 63 00 61 00 6e 00 53 00 75 00   ␀"␀s␀c␀a␀n␀S␀u␀
6d 00 6d 00 61 00 72 00 79 00 22 00 3a 00 20 00  m␀m␀a␀r␀y␀"␀:␀ ␀

But PowerShell will blindly assume a single-byte encoding for the program's output stream, and encode that as UTF-16 again, including the NUL bytes which actually belong to UTF-16 characters:

ff fe 0d 00 0a 00 00 00 0d 00 0a 00 00 00 7b 00  ÿþ.␀.␀␀␀.␀.␀␀␀{␀
00 00 0d 00 0a 00 00 00 0d 00 0a 00 00 00 20 00  ␀␀.␀.␀␀␀.␀.␀␀␀ ␀
00 00 20 00 00 00 20 00 00 00 20 00 00 00 22 00  ␀␀ ␀␀␀ ␀␀␀ ␀␀␀"␀

Here we see the UTF-16 BOM (ff fe) and then a real NUL character 00 00 is inserted at every spot where there was a NUL in the original output, except for line breaks, which is why we still see the regular \r\n (0d 00 0a 00). For example, a space (20 00 in UTF-16) will become 20 00 00 00, and appear as a space plus a NUL in a text editor, as you have seen in Notepad++.

This is of course horrible.

Your options are:

  • Run Log4jScanner.exe from cmd.exe
  • Remove the excess NUL characters from the output before parsing it

The latter would go like this:

$json = Log4jScanner.exe /scan_directory C:\something /report_pretty
$data = $json.Replace(([char]0).ToString(), "") | ConvertFrom-Json

.NET strings can legally contain NUL characters (C strings for example can not), but there is no legal NUL character in the JSON output we expect from the program, this is why throwing them all out works, but it's certainly not pretty - and it will only work for program output that does not actually contain Unicode characters (which happens to be the case here, all the characters in the JSON are in the ASCII range).

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • I got the same error than usual : "ConvertFrom-Json : Objet non valide passé, ':' ou '}' attendu. (2): {" ------ which means "object not valid ":" or "}" waited-------- – motorbass Jan 21 '22 at 07:09
  • I also gave a try within powershell core, error is more explicit " ConvertFrom-Json: Conversion from JSON failed with error: After parsing a value an unexpected character was encountered: 8. Path 'scanSummary.scanDurationSeconds', line 11, position 65." – motorbass Jan 21 '22 at 07:16
  • @inframan When the JSON is invalid then the JSON is invalid. There is nothing Powershell-specific about that error, every other JSON parser on Earth would tell you the same thing. Use https://jsonlint.com/ to check the output and draw conclusions. (maybe output from the STDERR stream is mixed in?) – Tomalak Jan 21 '22 at 08:12
  • I agree, but when i copy/paste the result into a json validator (i tried it first before creating this post) everything is ok. And as i tried, if i copy paste manually the result from the original variable to another one it works.. (the same when i export data in a json file, it's only readable using powershell) that's why i thought it was powershell specific – motorbass Jan 21 '22 at 08:18
  • @inframan Hardly, at least not in the JSON parsing part. What happens when you redirect the program's output to file (`.\Log4jScanner.exe /scan /report_pretty > output.json`) and then try to parse that (`Get-Content output.json -Raw | ConvertFrom-Json`)? – Tomalak Jan 21 '22 at 08:21
  • I tried it yesterday too, within powershell 5.1, the file is humanly unreadable (any character = null null null...) if i try to open it with notepad++ for instance. In another hand, i can open it properly within powershell but still can't access anything. I add a screenshot in my original post to show you :) – motorbass Jan 21 '22 at 08:47
  • @inframan Yes, the file seems to be UTF-16 encoded. What does it look like when you read it with `Get-Content -Encoding Unicode`? – Tomalak Jan 21 '22 at 09:07
  • I've downloaded the scanner and am trying it now. This is really weird. – Tomalak Jan 21 '22 at 09:11
  • same result if i get-content -encoding unicode output.json or get-content output.json. – motorbass Jan 21 '22 at 09:34
  • @inframan see updated answer – Tomalak Jan 21 '22 at 10:18
  • you're the king of Powershell :) Thanks a lot !!!! just a last question, how do you check current powershell encoding and/or a file or a command output encoding ? to write it down for the future if i go any doubt for another software. – motorbass Jan 21 '22 at 10:28
  • @inframan I'm using a hex editor to check the actual bytes a program writes. – Tomalak Jan 21 '22 at 10:29
  • This is a very unusual situation. The only other command like this I know is sfc. – js2010 Jan 21 '22 at 16:47
  • @js2010 Definitely unusual, and I would not be surprised if `Log4jScanner.exe` was in violation of some rule that applies to console programs. But it is what it is. – Tomalak Jan 21 '22 at 17:00
1

Here's some odd code I found that can id a unicode (utf16-le) no bom file. This comes up in windows occasionally. Notepad can id it as well (and utf8 no bom).

# istextunicode.ps1

param([Parameter(ValueFromPipeline=$True)] $filename)

# https://devblogs.microsoft.com/scripting/use-powershell-to-interact-with-the-windows-api-part-1/

begin {

$MethodDefinition = @'
[DllImport("Advapi32",SetLastError=false)]
public static extern bool IsTextUnicode(byte[] buf, int len, 
  ref IsTextUnicodeFlags opt);

[Flags]
public enum IsTextUnicodeFlags:int 
{
  IS_TEXT_UNICODE_ASCII16            = 0x0001,
  IS_TEXT_UNICODE_REVERSE_ASCII16    = 0x0010,
  IS_TEXT_UNICODE_STATISTICS         = 0x0002,
  IS_TEXT_UNICODE_REVERSE_STATISTICS = 0x0020,
  IS_TEXT_UNICODE_CONTROLS           = 0x0004,
  IS_TEXT_UNICODE_REVERSE_CONTROLS   = 0x0040,
  IS_TEXT_UNICODE_SIGNATURE          = 0x0008,
  IS_TEXT_UNICODE_REVERSE_SIGNATURE  = 0x0080,
  IS_TEXT_UNICODE_ILLEGAL_CHARS      = 0x0100,
  IS_TEXT_UNICODE_ODD_LENGTH         = 0x0200,
  IS_TEXT_UNICODE_DBCS_LEADBYTE      = 0x0400,
  IS_TEXT_UNICODE_NULL_BYTES         = 0x1000,
  IS_TEXT_UNICODE_UNICODE_MASK       = 0x000F,
  IS_TEXT_UNICODE_REVERSE_MASK       = 0x00F0,
  IS_TEXT_UNICODE_NOT_UNICODE_MASK   = 0x0F00,
  IS_TEXT_UNICODE_NOT_ASCII_MASK     = 0xF000
}
'@

  Add-Type Advapi32 $MethodDefinition -Namespace Win32
  $totalcount = 8
}

process {
  if ( (get-item $filename).length -lt 1mb ) {
  
    #$bytes = [io.file]::ReadAllBytes($filename)
    $bytes = get-content $filename -encoding byte -totalcount $totalcount
    
    # reset every time
    [Win32.Advapi32+IsTextUnicodeFlags]$opt = 0xffff
    
    $result = [win32.advapi32]::IsTextUnicode($bytes, $bytes.length, [ref]$opt)
    #$result = [win32.advapi32]::IsTextUnicode($bytes, $totalcount, [ref]$opt)
      
    [pscustomobject]@{
      Filename = $filename
      Result = $result
      Flags = $opt
    }
    #if($result) { write-host $filename }
  }
}

# error.log
# icacls (no bom), task scheduler, regedit
# gpreport.html

# $a = ls -force -r -file -exclude *.dll,*.exe,*.mui,*.jpg,*.jar,*.zip,*.msb,*.dat | get-item | istextunicode | where result
# -filter *.ini *.txt *.log

# $a = ls  -force -Recurse -Filter *.ini | get-item | istextunicode | where {$_.result -and $_.flags -notmatch 'signature' }

In cmd as admin do (in powershell it adds the wrong encoding bom signature):

sfc > file

Then in powershell:

.\istextunicode file

Filename Result                                                                                                     Flags
-------- ------                                                                                                     -----
file       True IS_TEXT_UNICODE_ASCII16, IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, IS_TEXT_UNICODE_NULL_BYTES
format-hex file | select -first 1


           Path: C:\users\admin\foo\file

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   0D 00 0D 00 0A 00 4D 00 69 00 63 00 72 00 6F 00  ......M.i.c.r.o.
js2010
  • 23,033
  • 6
  • 64
  • 66
  • It's "Unicode without BOM" because `Log4jScanner.exe` is unaware that its output is being redirected to file, and `cmd.exe` is unaware that `Log4jScanner.exe` is writing UTF-16 to its STDOUT. That's also explains why there is a BOM when doing the same command redirect in PowerShell. – Tomalak Jan 21 '22 at 17:20