0

I've been struggling with what I think is a really simple problem but I can't see it. I have a stack of 30-odd csv files of varying contents generated daily by different applications that I need to normalize before importing into a single reporting db. An Extract, Transform and Load (ETL) type of thing - global find and replace.

Looping through the files is no problem - not sure whether using ForEach-Object Fullname is the best way to go as outputting to an 'OUT' folder messes it up but using -Name means I have to include the path.

Basically, all 'True'/'False' text is to be replaced with 1/0, same with 'yes'/'no', poweredon/poweredoff, etc. Also we have 4 sites - each needs replacing with a ref. id, loads of stuff like that. I've tried modifying loads of scripts I've found on line - many in here. Tried using the replacement text in an array, pulling the CSV into a string, just can't see it. I've been doing the same thing for years with VBScript and it's easy. But I need to learn PowerShell so I'm going to persevere with it.

Matt
  • 45,022
  • 8
  • 78
  • 119
  • Can you show us at least an example with some sample text for testing. Just _sounds_ like a case of using `Get-Content` and `.Replace()` which as you have seen is covered rather well. Even here: http://stackoverflow.com/questions/17144355/string-replace-file-content-with-powershell – Matt Aug 25 '15 at 12:26
  • Do you know which column names contain the data you're after? If you do / you want to target specific columns just use `import-csv` to get your data, loop through the results replacing values in target columns, then use `export-csv` to save back to file. If you want to replace values wherever they appear look at using `get-content` and `set content`, with a regex to do the replacement; so long as your delimiter isn't included in the text (i.e. as is the case with `true`/`false` and a comma delimited file) this is probably the simplest option if you're targetting any appearance of those. – JohnLBevan Aug 25 '15 at 12:26
  • Hi guys, Thanks for the swift response. These are simple csv files of varying column counts from 5 columns to 45. They mostly contain company-confidential data so can't put them online though I understand what you want. I did look at and can do a simple find/replace on one value - so to change true => 1 for example, but then do I re-parse the file for false=>0? I have a lot of data in each file, and probably 7 or 8 values to replace in each one, mostly with 1/0 - the db fields are boolean. Some are replaced with an abbreviation. Will see what I can do to get an example. – user3681382 Aug 25 '15 at 12:40
  • The script I have - been hacked all morning so is probably very broken - Page won't let me add it - too long. – user3681382 Aug 25 '15 at 12:48

3 Answers3

0

Ok, here is a quick search and replace function for you. It can read multiple CSV files and match\replace multiple values.

function Replace-CsvValue
{
    [CmdletBinding()] # Enable pipeline support
    Param
    (
        # Filename, mandatory, takes pipeline input
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        # Alias, allows to directly pipe Get-ChildItem output to this function
        [Alias('FullName')]
        [string]$File,

        # Scriptblock, mandatory, does actual search and replace
        [Parameter(Mandatory = $true, ValueFromPipelineByPropertyName = $true)]
        [scriptblock]$ScriptBlock
    )

    Process
    {
        # Import CSV
        $CsvFile =  $File | Import-Csv

        # Generate new filename
        $NewFileName = Join-Path -Path (Split-Path -Path $File -Parent) -ChildPath ('Processed_' + (Split-Path -Path $File -Leaf))

        # Iterate over each line in CSV
        $CsvFile | ForEach-Object {
            # Execute scritblock against record
            & $ScriptBlock
        }

        # Export CSV
        $CsvFile | Export-Csv -Path $NewFileName -NoTypeInformation
    }
}

Usage:

  • Write scriptblock with required replace logic
  • Pipe filenames or Get-ChildItem output to the function and pass scriptblock

Example:

Original CSV file:

State, Active, Available
PoweredOn, True, Yes

Function call:

# Scriptblock with replace logic
$ReplaceRule = {

    # Iterate over each item in CSV line
    $Item = $_
    $_.PSObject.Properties.Name | ForEach-Object {

        # If item name matches...
        switch ($_)
        {
            'State' {
                # If item value matches...
                if($Item.$_ -eq 'PoweredOn')
                {
                    $Item.$_ = 'Online'
                }
                # Or if item value matches...
                elseif($Item.$_ -eq 'PoweredOff')
                {
                    $Item.$_ = 'Offline'
                }
                break
            }

            # More replace rules, you can add your own here...

            'Active' {
                if($Item.$_ -eq 'True')
                {
                    $Item.$_ = '1'
                }
                elseif($Item.$_ -eq 'False')
                {
                    $Item.$_ = '0'
                }
                break
            }

            'Available' {
                if($Item.$_ -eq 'Yes')
                {
                    $Item.$_ = '1'
                }
                elseif($Item.$_ -eq 'No')
                {
                    $Item.$_ = '0'
                }
                break
            }
        }
    }
}

# Get all CSV files that match wildcard and
# feed them to the Replace-CsvValue function

Get-ChildItem -Path '.\' -Filter '*Report*.csv' | Replace-CsvValue -ScriptBlock $ReplaceRule

Processed CSV file:

"State","Active","Available"
"Online","1","1"
beatcracker
  • 6,714
  • 1
  • 18
  • 41
  • OK, thanks very much. I was looking along the lines of: `$FileContent -replace 'True', '1' -replace 'False', '0' -replace 'Yes', '1' -replace 'No', '0' -replace 'PoweredOn', '1' -replace 'PoweredOff', '0' -replace 'Running', '1' -replace 'NotRunning', '0' -replace 'anyoldrubbish', '1' $FileContent | Set-Content $Path\OUT\$file` Used backticks - not showing as code - what am I doing wrong? – user3681382 Aug 25 '15 at 13:29
  • @user3681382 Code in comments works as intended, you wouldn't get it any prettier than this. And I'm personally against dumb search-and-replace with `$FileContent -replace` - it has no awareness of the CSV file structure and can easily break it. – beatcracker Aug 25 '15 at 13:44
  • OK, thanks. Finding PShell a bit of a pain in the proverbial, tbh. Still, going to persevere. – user3681382 Aug 25 '15 at 14:22
0

I made this csv for testing with the help of Mockaroo. Notice someones first name is True. I have that in there as a check to be sure my logic is working.

Present Name             Lunch State      
------- ----             ----- -----      
TRUE    Jesse Daniels    No    Powered Off
FALSE   Debra Cunningham Yes   Powered Off
TRUE    True Jones       Yes   Powered Off
TRUE    George Fernandez Yes   Powered Off
FALSE   Lisa Cox         No    Powered On 

For the purpose of this I think it would be simple to just ignore the fact that it is a CSV and just replace the text outright. The caveat we have to be careful for is partial matches. Using regex we should be able to account for that possibility.

From comments you already know that you can chain -replace. Lets add some regex magic in there to make the process easier.

$filename = "C:\temp\MOCK_DATA.csv"
$oneKeywordPattern = "Powered On","Yes","True" -join "|"
$zeroKeywordPattern = "Powered Off","No","False" -join "|"
(Get-Content $filename) -replace "(?<=^|,)$oneKeywordPattern(?=$|,)","1" -replace "(?<=^|,)$zeroKeywordPattern(?=$|,)","0" | Set-Content $filename

To make sure that the csv structure is accounted for we only replace if the element is at the start of the line or a comma followed the end of the line or comma (This is using a lookahead and lookbehind.). This also ensures that we only change full elements and True Jones is not affected.

We used $oneKeywordPattern so that you can add elements to the array that need to be changed to a 1. We join them with a pipe so that it is treated as a alternative regex pattern. Its counterpart $zeroKeywordPattern functions just the same.

Output

Present Name             Lunch State
------- ----             ----- -----
1       Jesse Daniels    0     0    
0       Debra Cunningham 1     0    
1       True Jones       1     0    
1       George Fernandez 1     0    
0       Lisa Cox         0     1    

You could likely have other patterns that do not need to be changed with this logic. Just chain another -replace and remember that it supports regex so watch out for special characters.

The two caveats here is that if the files are large it could take a while to load the file and process the regexes (especially if you add more.) Also if your text is enclosed in quotes we don't currently account for that but it would be easy.

Matt
  • 45,022
  • 8
  • 78
  • 119
  • Your code replaces only `Lunch` in this CSV: `"Present","Name","Lunch","State" "TRUE","Jesse Daniels","No","Powered Off"`. – beatcracker Aug 25 '15 at 14:09
  • @beatcracker I will test again but that was not my findings...... I just tested again and it is working fine for me. – Matt Aug 25 '15 at 14:11
  • I'm sure that you tested it, but I can't get it work, sorry. Btw, is your regex aware of the comma inside quotes in value, like this: `"My, oh my"`? – beatcracker Aug 25 '15 at 14:12
  • @beatcracker For that values that the OP is looking for that is not a concern and I would say low on the risk chart. Currently it would not break anything. – Matt Aug 25 '15 at 14:13
  • If there was a cell like this "My, oh my that is ,true, I would say". That would be a problem – Matt Aug 25 '15 at 14:25
0

Basically, all 'True'/'False' text is to be replaced with 1/0, same with 'yes'/'no', poweredon/poweredoff, etc. Also we have 4 sites - each needs replacing with a ref. id, loads of stuff like that. I've tried modifying loads of scripts I've found on line - many in here. Tried using the replacement text in an array, pulling the csv into a string, just can't see it. I've been doing the same thing for years with vbscript and it's easy. But I need to learn PShell so I'm going to persevere with it. I'd really appreciate some help here.

If it's that static, you can probably get away with:

$changes = @{
  'true' = '1';
  'false' = '0';
  'poweredon' = '1';
  'poweredoff' = '0'
}
$folder = "" # your folder here
$csvFiles = ls $folder *.csv
foreach ($file in $csvFiles) {
  $csvData = import-csv $file
  foreach ($row in $csvData) {
    $cells = $row | `
       gm | `
       ?{$_.MemberType -eq 'NoteProperty'} | `
       select -exp Name
    foreach ( $cell in $cells ) {
       $val = $row."$cell"
       $valueNeedsChanging = $changes.ContainsKey($val)
       if ( $valueNeedsChanging ) {
          $newValue = $changes[$val]
          $row."$cell" = $newValue
       }
    }
  }
  cp $file.FullName "$($file.FullName).bak" # back it up before saving
  $csvData | export-csv -Path $file.FullName -NoTypeInformation
}

I chose to use Import- and Export-CSV to preserve the structure of the CSV file for files that have a lot of advanced formatting.

Carlos Nunez
  • 2,047
  • 1
  • 18
  • 20
  • Thanks Carlos - that worked a treat. I needed to change the line `$csvData = import-csv $file` to `$file.FullName` but it's worked perfectly. Many thanks. – user3681382 Aug 26 '15 at 09:10
  • Sure thing! Glad to help. Can you mark this as the answer when you get a chance? – Carlos Nunez Sep 18 '15 at 18:24