2

I am having a bit of a conundrum working with some CSV files that need to be cleansed and loaded into a database.

I am fairly adept with PowerShell, but poor with regular expressions, and csv column manipulation.

Here is the issue I am having; there is a 'notes' field in the CSV file I am working with, that can have all sorts of various characters. The main problem is that I need to remove the line feeds, and quotes WITHIN the field, but leave the regular line feeds and text qualifying quotes where they should be. I can remove the line feeds and quotes throughout the file, but not specifically down to the characters within the field.

I have tried working with regular expressions to do this, but am not having much luck, and honestly, I am not that adept with regular expressions. I am hoping someone here will be able to help with this!

Edit: here is the example data

"123"   ""  "2017-02-13 10:26:08" "123456789"   "2017-02-10"    "No"    "Yes"   "Yes"   "No"    "sa‌​mple text 
<crlf> ""additional text""
<crlf> 
<crlf> "    "Y" <crlf>

this should simply be one line with no except at the end.

shannonjk
  • 45
  • 6
  • 1
    posting an example of a couple rows would be helpful – Justin Ohms Feb 17 '17 at 22:38
  • Here is an example of the output. It doesn't look like this is saving the return fields though so I manually added them with . The output should be that this is only one record. "123" "" "2017-02-13 10:26:08" "123456789" "2017-02-10" "No" "Yes" "Yes" "No" "sample text ""additional text"" " "Y" – shannonjk Feb 17 '17 at 22:41
  • Get a dedicated CSV parser. There are many available for .Net, and powershell can load and use .Net objects. – Joel Coehoorn Feb 17 '17 at 22:46
  • @shannonjk Please [edit] that into your question. We cannot see how that formats in comments properly. Show us a few lines and anything you might have done to try and fix this yourself. SO is not a code writing service but a community of programmers and programming enthusiasts.I'm sure we can figure this out. – Matt Feb 18 '17 at 02:08

1 Answers1

1

The built-in Import-Csv cmdlet correctly imports multiline and quoted values.

Your file is tab-delimited so we'll specify "`t":

Import-Csv c:\file.csv -Delimiter "`t" | ForEach {
    $_.notes = $_.notes -replace '"', '' -replace '[\r\n]+', ' '
    $_
} | Export-Csv c:\output.csv -Delimiter "`t" -NoTypeInformation -Encoding UTF8
wOxxOm
  • 65,848
  • 11
  • 132
  • 136
  • This worked flawlessly. I am not great with regular expressions, can you explain what the second replace function you did is doing? I tried the foreach and foreach-object with similar replaces, including the quote one you used, which replaced all quotes, but this seems to handle it exactly as I need, and I have no idea how :). – shannonjk Feb 20 '17 at 18:14
  • It matches 1 and more special new line characters denoted as \r and \n. Try [regex101](https://www.regex101.com/r/H9oUIY/1): it's a great online playground for studying regex. – wOxxOm Feb 20 '17 at 18:22