0

Trying to import CSV that has columns of ID, Start Date, End Date where each row could contain the same ID value but different dates.
Want to sort this so it's finding the earliest date for each ID AND the latest End date for that same ID, then write that to a CSV.

Input csv:

ID Start Date End Date
100 01-Apr-2021 23-May-2021
100 10-June-2021 30-Dec-2021
200 12-Aug-2021 23-Sep-2021
200 25-Dec-2021 24-May-2022

Output csv:

ID Start Date End Date
100 01-Apr-2021 30-Dec-2021
200 12-Aug-2021 24-May-2022

I thought I could import CSV, then identify ID and then for each ID sort start and end dates but can't seem to get it to work.

Code thus far:

$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"

$csv = Import-Csv $source 
Foreach ($usr in $csv){
    $userID= $usr."ID"
    $filtered = $csv | Where-Object {($_.'ID') -eq $userID}
    $output = [pscustomobject]@{
                     'ID'  = $userID
                     'Start Date'   =($filtered.'Start Date' | Sort-Object @{Expression= 
                      {[datetime]($_."Start Date")};Descending=$false} | Select -First 1
                     'End Date'     =($filtered.'Start Date' | Sort-Object @{Expression= 
                      {[datetime]($_."End Date")};Descending=$True} | Select -Last 1
                     }
}
$output | Export-csv $outPut 

Sadly I'm struggling with the sorting of start and end dates per ID and setting a value of earliest and oldest per ID.

Any thoughts?

mklement0
  • 382,024
  • 64
  • 607
  • 775
Mr80s
  • 65
  • 1
  • 8

2 Answers2

3
  • Use the Group-Object cmdlet to group the rows by shared ID value.

  • For each group of rows that have the same ID, find the earliest date in the Start Date column, and the latest one in the End Date column, then construct an output [pscustomobject] combining these values.

$source = "C:\test\test.csv"
$output = "C:\test\Testsort.csv"

Import-Csv $source | 
    Group-Object Id |
      ForEach-Object { 
        $earliestStart = [Linq.Enumerable]::Min([datetime[]] $_.Group.'Start Date')
        $latestEnd = [Linq.Enumerable]::Max([datetime[]] $_.Group.'End Date')
        [pscustomobject] @{
          ID = $_.Name
          'Start Date' = $earliestStart.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
          'End Date' = $latestEnd.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
        }
      } |
        Export-csv $output 

Note:

  • While using Sort-Object to find the earliest and latest date is an option too, the [System.Linq.Enumerable]::Min() approach both performs better and requires less memory.

  • Note the use of cast [datetime[]] to convert the strings from the CSV input to [datetime] instances so that finding the chronological minimum / maximum works properly.

  • On output, the min. and max. dates are converted back to the desired date string format, using the [datetime] type's .ToString() method.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Hey, as always awesome answer. Would you point me to other of your answers where `[System.Linq.Enumerable]` can be useful? I would like to learn more on this. – Santiago Squarzon Nov 02 '21 at 19:08
  • 1
    Thanks, @SantiagoSquarzon. I'd start with [this answer](https://stackoverflow.com/a/48134048/45375), which covers LINQ use in PowerShell in general. – mklement0 Nov 02 '21 at 19:11
  • Thank you! I usually learn more from your answers rather than official docs hehe – Santiago Squarzon Nov 02 '21 at 19:15
  • 1
    I'm both glad and sad to hear that, @Santiago :) – mklement0 Nov 02 '21 at 19:16
  • Wow that's Cool! I have a lot more learning to do. Quick question, the date output .ToString('dd-MMM-yyyy') is writing the date as 24-May.-2022 (putting a period after month)....any reason for that? – Mr80s Nov 02 '21 at 19:18
  • Good point, @Mr80s: `.ToString()` applies culture-sensitive formatting by default, which can involve a trailing `.` after the abbreviated month name. Therefore, the _invariant_ culture must be used - please see my update. – mklement0 Nov 02 '21 at 19:27
1

make use of group-object and custom attributes and you can do it with a one-liner:

$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"
$csv = Import-Csv $source 
$csv|group-object id|select name,@{l='start date';e={$_.group|sort "Start Date"|select -expand "Start Date" -first 1 }},@{l='end date';e={$_.group|sort "end Date"|select -expand "end Date" -last 1 }}|Export-csv $outPut 
Mikel V.
  • 351
  • 2
  • 8
  • That's promising, but note that the date values read from the CSV are _strings_ and therefore won't sort _chronologically_ by default. Also, can I suggest spreading your solution across multiple lines for the sake of readability? (You can still do that as a single _pipeline_, and if you end each line with `|` you don't need backticks for line continuation). – mklement0 Nov 02 '21 at 18:37
  • I was noticing the same thing (String date values). It's seems to work "sort of" but doesn't put the Start Date Ascending when using a bigger file with 1100 rows of ID's (4-5 dupe ID's) Would you have to expand that to something like: @{l='Start Date';e={[datetime]($_.group)|sort "Start Date" |select -expand "Start Date" -first 1 }} – Mr80s Nov 02 '21 at 19:09
  • @Mr80s, `$_.Group | Sort-Object { [datetime] $_.'Start Date' }` and `$_.Group | Sort-Object { [datetime] $_.'End Date' }` should do. – mklement0 Nov 02 '21 at 19:13
  • 1
    @mklement0 thank you! worked – Mr80s Nov 02 '21 at 19:26