3

Consider these two JSON files:

{
  "tables": [
    {
      "columns": [
        {
          "name": "action",
          "type": "string"
        },
        {
          "name": "region",
          "type": "string"
        },
        {
          "name": "count_",
          "type": "long"
        },
        {
          "name": "min_timestamp",
          "type": "datetime"
        },
        {
          "name": "max_timestamp",
          "type": "datetime"
        }
      ],
      "name": "PrimaryResult",
      "rows": [
        [
          "ChangePassword",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ]
      ]
    }
  ]
}
{
  "tables": [
    {
      "columns": [
        {
          "name": "action",
          "type": "string"
        },
        {
          "name": "region",
          "type": "string"
        },
        {
          "name": "count_",
          "type": "long"
        },
        {
          "name": "min_timestamp",
          "type": "datetime"
        },
        {
          "name": "max_timestamp",
          "type": "datetime"
        }
      ],
      "name": "PrimaryResult",
      "rows": [
        [
          "ChangePassword",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ],
        [
          "Register",
          "CN",
          1,
          "2022-07-19T11:51:52.8430729Z",
          "2022-07-19T11:51:52.8430729Z"
        ]
      ]
    }
  ]
}

They have the exact same schema and are almost the same. The difference is that first file has 1 row in the rows property, the second file has 2 rows in the rows property.


I use ConvertFrom-Json to load the files into PowerShell. When I start accessing .tables.rows[0], I expect that it will return the first row.

However, in the first file that has only 1 row, it actually returns the first column of the first row.

(gc file1.json | ConvertFrom-Json).tables.rows[0]

Outputs:

ChangePassword

If there are more than 1 row as in the second file, then the .tables.rows[0] behaves as expected.

(gc file2.json | ConvertFrom-Json).tables.rows[0]

Outputs:

ChangePassword
CN
1
2022-07-19T11:51:52.8430729Z
2022-07-19T11:51:52.8430729Z

How can I reliably process these JSON files in PowerShell regardless of if they have 1 row or multiple rows in them?

btw. These are actually JSON files produced by az cli as results from Azure Application Insights queries.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Tom Pažourek
  • 9,582
  • 8
  • 66
  • 107
  • 1
    Try with `@( , (gc file1.json | ConvertFrom-Json).tables.rows ))[0]` can't test from my phone but I think you're looking to prevent enumeration – Santiago Squarzon Jul 19 '22 at 13:43
  • 1
    Try `(gc file1.json | ConvertFrom-Json).tables[0].rows[0]` – nimizen Jul 19 '22 at 13:43
  • 1
    @nimizen Also works, nicest solution yet. I don't understand why though... I'd accept that as the answer if it wasn't a comment. – Tom Pažourek Jul 19 '22 at 13:45
  • or dont index through the array? `$json.tables.rows`, I don't see it as unexpected behavior. – Abraham Zinala Jul 19 '22 at 13:46
  • @AbrahamZinala But what if I need to get the first row? Or what if I need to iterate through all rows and process the data somehow? If I use `$json.tables.rows`, it behaves differently each time. For example, what if I want to just count rows (i.e. `.tables.rows.count`)? If there's one row, it returns the number of columns (`5`) instead of the number of rows (`1`). If there are two rows, it returns correctly `2`. Hope it helps explaining my problem, but I guess one has to try it to see the issue. – Tom Pažourek Jul 19 '22 at 13:52
  • I still don't see the issue. There are 2 arrays, so it will give you a count of the 2 arrays. How would it know what to unroll and what not to unroll? `@(1,(1,2)).Count` gives you 2, while `@(1,(1,2)).ForEach{$_}.Count` gives you 3. – Abraham Zinala Jul 19 '22 at 13:56
  • @AbrahamZinala Please try it with the JSON files posted in the question. Try the 1st file and run: `(gc file1.json | ConvertFrom-Json).tables.rows.count`. You'll get `5` instead of `1` (there's one row there). Regarding the unrolling, I do not wish it to unroll anything and respect the original structure of the JSON file. – Tom Pažourek Jul 19 '22 at 13:57
  • Correct, because it get's unrolled and the content is what's counted. – Abraham Zinala Jul 19 '22 at 14:00
  • 2
    PowerShell consumes the outer array during enumeration when there is only 1 nested array, this is why pretending the comma operator `,` before the expression works. It wraps the single array in a new array. Hard to explain from phone but can post an answer later if nobody does – Santiago Squarzon Jul 19 '22 at 14:09
  • @SantiagoSquarzon Thanks, I think I get this now. But I fail to understand the answer from @nimizen. Why is `$json.tables.rows[0]` different to `$json.tables[0].rows[0]`? – Tom Pažourek Jul 19 '22 at 14:10

2 Answers2

2

tl;dr

You need to avoid member-access enumeration if you want to access collection-valued properties as-is, which in your case means the following, as nimizen demonstrated in a comment:

# Note the [0] after .tables
# (-Raw was added for efficiency)
(gc -Raw file1.json | ConvertFrom-Json).tables[0].rows[0]

Because your tables property is itself an array, accessing just .tables results in member-access enumeration (even though the array happens to have just one element).
Using index [0] to target that one element explicitly allows access to its (one and only) .rows property without the latter's value being subject to implicit enumeration, as explained in the next section.


You're seeing a surprising aspect of PowerShell's member-access enumeration feature:

The values obtained from the members of a collection's elements are emitted as if they were sent to the pipeline, one by one, which has the following implications:

  • As happens by default in the pipeline, any such value is enumerated if it happens to be collection-valued, i.e. its elements are emitted, one by one.

  • On accessing the resulting output, if there's only one output object (whether or not that happens to be a collection itself), it is captured as-is; if there are two or more, they are implicitly collected in a regular PowerShell array (of type [object[]]).

The upshot for enumerated member values that happen to contain collections is:

  • The input collections themselves are always lost, because only their elements are output.

  • If there are multiple collection-valued member values, you'll end up with a single, flat array, invariably of type [object[]], that is the concatenation of the elements of all collections involved.

  • If there is only one collection-valued member value, and that value happens to be a single-element collection, you'll end up with that single element as-is (only with multiple elements would you end up with an [object[]] array).

This surprising behavior is the subject of GitHub issue #6802, though note that the behavior is unlikely to change, so as not to break backward compatibility.


A simplified example to demonstrate what happened in your case:

Note: JSON is incidental to the problem, so I'm using a [pscustomobject] instance below.

# Construct a single-element array that has a [pscustomobject] element.
# whose .prop property contains a single-element array whose only
# element is another array, typed [int[]].
$array = , [pscustomobject] @{ prop = , [int[]] (1, 2) }

# Use member-access enumeration to access the .prop member value
# of all elements in the array (here, there's only one), and get
# the first element.
$array.prop[0] # !! -> scalar 1, i.e. the first elem. of the *nested* array,
               # !! because member-access enumeration *enumerated* the 
               # !! single-element outer array.

# Without member-access enumeration, the value of .prop is accessed as-is,
# as a nested array:
$array[0].prop[0] # -> [int[]] (1, 2), as expected.
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks, this explains it well. I'm familiar with the pipeline and this behavior regarding collection flattening, but I was not aware that member-access enumeration exists and that it basically turns it into a pipeline... It makes sense now. – Tom Pažourek Jul 19 '22 at 18:15
  • 1
    Glad to hear it, @TomPažourek. I wish it worked differently, but changing the behavior now would be a serious breaking change. – mklement0 Jul 19 '22 at 19:18
1

don't know how to stop powershell behaving that way, but got it "reliable" by

$test=Get-Item ".\file.json" | get-content | convertfrom-json
if($test.tables.rows[0].count -gt 1 ){
    $result = $test.tables.rows[0][0] 
}else{
    $result = $test.tables.rows[0]
}

ragingdev
  • 33
  • 5
  • Thanks, this seems to work fine for getting the first row. Any idea how to work with this if I want to iterate over all of the rows, and not just get the first one? I guess I kind of need to do the opposite, instead of always getting a single row, make it always get an array of arrays. But I'm failing at that... I'm trying `[object[][]]$test.tables.rows`, but it's still a plain array, not array of arrays... – Tom Pažourek Jul 19 '22 at 13:28