1

In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.

CSV:

user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1

I can do:

import-csv foo.csv | where {$_.user -eq $user}

But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?

I am aware of calculated property like:

Select-Object @{ expression={$_.database1_dat}; label='database1' }

but i have to know all column names for that, as far as I know.

Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?

Thanks :-)

MilvusCZ
  • 13
  • 3

2 Answers2

2

Assuming that file foo.csv fits into memory as a whole, the following solution performs well:

$headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2

# You can pipe the result to `where {$_.user -eq $user}`
ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';' 
  • Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).

  • -split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.

    • Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
    • , 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
    • Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
  • $headerRow -replace '_dat(?=;|$)'

    • -replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
  • ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.


Alternative solution: algorithmic renaming of an object's properties:

Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.

As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.

However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:

Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
  # Initialize an aux. ordered hashtable to store the renamed
  # property name-value pairs.
  $renamedProperties = [ordered] @{}
  # Process all properties of the input object and
  # add them with cleaned-up names to the hashtable.
  foreach ($prop in $_.psobject.Properties) {
    $renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
  }
  # Convert the aux. hashtable to a custom object and output it.
  [pscustomobject] $renamedProperties
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Do you think this would be the best performing solution? I'm looking to implement something similar and the solution on [MSDN](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/import-csv?view=powershell-7.1#example-4--change-property-names-in-an-imported-object) is arguably much less efficient; so looking to see if this solution is the 'best' or if it would be even faster using .Where() and a few other tweaks – immobile2 Aug 22 '21 at 19:42
  • I was thinking `$headerRow, $dataRows = (Get-Content -ReadCount 0 foo.csv).Where({$_}, 'Split', 1)` but it doesn't seem any faster, but it does seem to have different output from `-split` or at least different than using -Raw. – immobile2 Aug 22 '21 at 20:10
  • @immobile2, if it's acceptable to get the rest of the file as a single, multi-line string, the first solution is the fastest I'm aware of. While `-ReadCount 0` speeds up things if you can process the resulting array _as a whole_, `.Where()` of necessity _enumerates_ the array again, and executing a script block (`{ ... }`) for each enumerated element is slow - you may as well do `$headerRow, $dataRows = Get-content foo.csv`, though you can speed that up significantly with `$headerRow, $dataRows = [IO.File]::ReadAllLines((Convert-Path foo.csv))` – mklement0 Aug 22 '21 at 23:46
  • Probably too long for a comment, let me know if you think a separate answer would be helpful for future searchers or not. I _finally_ grasped what your `-split` was truly doing, the fact that you're treating the 1st true row as the delimiter itself and then returning it is slick; but possibly more complicated than necessary or at least took longer for me to grasp what is actually happening. Consider this alternative: `$null, $headerRow, $dataRows = (gc -Raw foo.csv) -split '^', 3, 'multiline'` you can even streamline things a bit with `((gc -Raw foo.csv) -split '^', 3, 'multiline').Trim()` – immobile2 Sep 15 '21 at 18:14
  • @immobile2, yes, it was too complicated; the simplest solution is `-split '\r?\n', 2`, which also obviates the need for `$null` as the first receiving variable - this simplification is already part of the (recently updated) answer. – mklement0 Sep 15 '21 at 18:16
1

You can do something like this:

$textInfo = (Get-Culture).TextInfo
$headers = (Get-Content .\test.csv | Select-Object -First 1).Split(';') |
ForEach-Object {
    $textInfo.ToTitleCase($_) -replace '_dat'
}

$user = 'peter'

Get-Content .\test.csv | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' -Header $headers |
Where-Object User -EQ $user
User  Database1 Database2 Database3
----  --------- --------- ---------
peter 1         0         1        

Not super efficient but does the trick.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 2
    I have used mklement0`s solution at the end, but this is also perfectly valid solution for my problem. I can use this approach in some future project. Thank you, really. – MilvusCZ Jul 20 '21 at 07:18