2

I'm stuck in the following situation: I have to get information out of a CSV file. I imported the CSV using Import-Csv.

My raw data looks like this:

45227;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXX;XXXX;XXX@XX.com;;;3.7;;

where the column containing 3.7 is the value of interest ("Points").

Here comes my first problem --> Using Import-Csv, powershell will save this information in a [string] property. To avoid that i used the following line:

| Select @{Name="Points";Expression={[decimal]$_.Points}}

Now i'm getting a Selected.System.Management.Automation.PSCustomObject-typed object containing that property as a [decimal]. Now i wanted to sum up all the points, that were used by the same e-mail address:

$Data[$Index].Points += (
  $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} | 
    measure Points -sum
).Sum

This seemed to work just fine, but if i open up $Data[$Index] | gm i'm getting this: Points NoteProperty double Points=71301.6000000006

The property changed to [double]. I dug a bit and i found out that Powershell's GenericMeasureInfo.Sum Property can only give back a Nullable<Double> instance as a property value.

It seems like i'm producing an overflow of [double], because the number being displayed is totally wrong. I want to stick to decimal or integer so i have an output like 71123.4 or something like that.

Is there any other approach for that, so i don't have to use (Measure-Object -sum).Sum ?

Thanks in advance!

mklement0
  • 382,024
  • 64
  • 607
  • 775
JaneDoe
  • 23
  • 5
  • 2
    Please be a bit more specific. In the example CSV, show us the headers aswell. Where does `$Imported_CSV_Unique.Sender` in your code come from? Can't you simply do something like `[decimal]$sum = 0; $yourData | ForEach-Object { $sum += $_.Points}` ? – Theo Mar 17 '19 at 10:57

3 Answers3

3

I'd start by grouping all the sender addresses together and then sum them individually:

Import-Csv .\data.csv |Group-Object Sender |ForEach-Object {
    [pscustomobject]@{
        Sender = $_.Name
        SumOfPoints = ($_.Group |Measure-Object Points -Sum).Sum
    }
}

Measure-Object will automatically cast the Points strings to [double] - if you require more precision you can manually cast to [decimal] like before:

Import-Csv .\data.csv |Select-Object Sender,@{Name="Points";Expression={[decimal]$_.Points}} |Group-Object Sender |ForEach-Object {
    [pscustomobject]@{
        Sender = $_.Name
        SumOfPoints = ($_.Group |Measure-Object Points -Sum).Sum
    }
}
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Great idea to use `Group-Object` (+1); note, however, that in the second command you'll lose the `[decimal]` precision again when you call `Measure-Object`. – mklement0 Mar 17 '19 at 13:20
3

Using grouping like Mathias already did, here is how you can get the sum without losing the decimal precision, as I have commented before:

# faking the Import-Csv here with a here-string.
# in real life, you would use: Import-Csv <yourdata.csv> -Delimiter ';'
$data = @"
Sender;Date;Description;Something;Number;Whatever;DontKnow;Email;Nothing;Zilch;Points;Empty;Nada
45227;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXV;XXXA;XXX@XX.com;;;3.7;;
45227;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXW;XXXB;XXX@XX.com;;;4.7;;
45226;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXX;XXXC;XXX@XX.com;;;4.777779;;
45225;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXY;XXXD;XXX@XX.com;;;4.8;;
45225;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXZ;XXXE;XXX@XX.com;;;4.9;;
"@ | ConvertFrom-Csv -Delimiter ';'

#get the two columns you need from the Csv and group them by Sender
$data | Select-Object Sender, Points | Group-Object Sender | ForEach-Object {
    # add the 'Points' values as decimal
    [decimal]$sum = 0
    foreach ($value in $_.Group.Points) { $sum += [decimal]$value }
    [PSCustomObject]@{
        Sender = $_.Name
        Sum    = $sum
    }
}

Output from the above would be:

Sender      Sum
------      ---
45227       8,4
45226  4,777779
45225       9,7
Theo
  • 57,719
  • 8
  • 24
  • 41
3

tl;dr:

If you need to control the specific numeric data type used for summing up numbers:

  • Avoid Measure-Object, which invariably uses [double] calculations.

  • Instead, use the LINQ Sum method (accessible in PSv3+) with a cast to the desired numeric type:

[Linq.Enumerable]::Sum(
  [decimal[]] @(
    $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
  ).Points
)

Mathias R. Jessen's helpful answer shows you an elegant way to sum your Points column grouped by rows that share the same email address and Theo's helpful answer improves on it by truly summing the points as [decimal] values.

Some general points about Measure-Object with -Sum and floating-point data types:

You correctly state:

The property [data type] changed to double [...] i found out that Powershell's GenericMeasureInfo.Sum property can only give back a Nullable<Double> as property value.

Indeed: Measure-Object -Sum:

  • invariably uses [double] values to sum up the inputs.
  • it coerces the inputs to [double]s, if possible - even if they're not numbers.
    • If an input cannot be coerced to a [double] (e.g., 'foo'), an non-terminating error is emitted, but summing continues with any remaining inputs.

The above implies that even strings are acceptable input to Measure-Object -Sum, because they'll be converted to [double] on demand during summation. That means that you could use your Import-Csv command directly, as in the following example (which uses two [pscustomobject] instances to simulate Import-Csv's output):

PS> ([pscustomobject] @{ Points = '3.7' }, [pscustomobject] @{ Points = '1.2' } |
      Measure-Object Points -Sum).Sum
4.9  # .Points property values were summed correctly.

71301.6000000006 [...] It seems like i'm producing an overflow of "double"

Overflow would imply exceeding the maximum value that can be stored in a [double], which is (a) unlikely ([double]::MaxValue is 1.79769313486232E+308, i.e., greater than 10 to the power of 308) and (b) would produce a different symptom; e.g.:

PS> ([double]::MaxValue, [double]::MaxValue | Measure-Object -Sum).Sum
∞  # represents positive infinity

What you do get, however, is rounding errors due to the [double] type's internal binary representation, which doesn't always have an exact decimal representation, which can lead to baffling calculation results; e.g.:

PS> 1.3 - 1.1 -eq 0.2
False # !! With [double]s, 1.3 - 1.1 is NOT exactly equal to 0.2

For more information, see https://floating-point-gui.de/

Using [decimal] values does solve this problem, but note that this comes at the expense of a smaller range (in effect, you get 28 decimal digits of precision - the absolute value of the max. number depends on where the decimal point is placed; as an integer, it is 79,228,162,514,264,337,593,543,950,335, i.e. close to 8 * 1028).

If you do need the precision of [decimal]s, you must avoid Measure-Object and do your own summing.

In the context of your original command, you could use the Sum LINQ method:

[Linq.Enumerable]::Sum(
  [decimal[]] @(
    $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
  ).Points
)
  • The use of @(...) (the array subexpression operator) rather than just (...) around the pipeline command ensures that the overall command doesn't fail in case the pipeline happens to return no rows. @(...) turns the non-output into an empty array, for which .Sum() correctly returns 0.

    • Without it, the [decimal[]] cast would result in $null, and PowerShell wouldn't be able to find the [decimal[]]-typed overload of the .Sum() method and report an error, "Multiple ambiguous overloads found for "Sum" and the argument count: 1".
  • The above command invariably requires all matching CSV rows (represented as custom objects) into memory as a whole, whereas Measure-Object - as most cmdlets in the PowerShell pipeline - would process them one by one, which requires only a constant amount of memory (but is slower).

If loading all matching rows into memory at once is not an option, use the ForEach-Object (foreach) cmdlet, but note that this would only make sense if you substituted an actual Import-Csv call for the already-in-memory array $Imported_Csv:

# Replace $Imported_Csv with the original Import-Csv call to 
# get memory-friendly one-by-one processing.
$Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} |
  foreach -Begin { [decimal] $sum = 0 } -Process { $sum += $_.Points } -End { $sum }
mklement0
  • 382,024
  • 64
  • 607
  • 775