1

I am trying to clean up some text files from multiple spaces and other characters. I want only the text within double quotes to remain in the line.

Here is the example of the text file:

   "uid" : "Text To Remain", 
 "id" : "Text2 To Stay",

Note the empty characters / tabs at the beginning of each line and comma at the end of each line.

So I thought the easiest way to get rid of those empty spaces on the left would be using the regular expression. In every line there is a space_colon_space string: " : ", so I try to erase everything to the left of it, including the string itself.

I came up with two examples of solutions:

get-content 'K:\text.txt' -ReadCount 1000 |
 ForEach-Object {
 $_.replace(".* : ", "").replace(",", "")
 } |
  Out-File 'K:\text_cleaned.txt'

This solution works only for the comma, but does not work for the colon. There is no error.

Second solution:

get-content 'K:\text.txt' -ReadCount 1000 |
 foreach { $_ -replace ".* : " |  out-file 'K:\text_cleaned.txt'
}

And this works and cleans up everything on the left of the first double quote character, but I have no idea how to add a function to replace the comma at the end of each line in the same line.

Why not to do it in a simpler way?

I am very curious why the regular expression /.* : / in the first solution does not work, while the one in the second does work. And there is no error in the first one.

Could you enlighten me?

KolesGit
  • 71
  • 7
  • (gc 'K:\text.txt').trim() -replace ',$' should help? – Toni Oct 31 '22 at 16:55
  • This works too! I will try to analyze what this simple and genious code does. How does it know what to erase to the left of the quote in the text file? The second part is understandable, but the first one... I don't know. – KolesGit Oct 31 '22 at 19:43
  • trim() = Removes all leading and trailing white-space characters... see: https://devblogs.microsoft.com/scripting/trim-your-strings-with-powershell/ – Toni Oct 31 '22 at 20:45

1 Answers1

1

Try the following:

(Get-Content 'K:\text.txt' -ReadCount 0) -replace '.+ : "|",\s*$' |
   Out-File 'K:\text_cleaned.txt'

Output:

Text To Remain
Text2 To Stay
  • -ReadCount 0 reads the entire file into a single array, at once, which greatly speeds up processing.

  • The -replace operation effectively replaces all characters from the start of each line through the " following:  : , as well as the last " if followed by a , and potentially by whitespace at the end of the line.

Note: The assumption is that verbatim substring  : " only occurs between "..." strings, not also embedded in them, say as in "Foo "" : "" bar"


As for what you tried:

$_.replace(".* : ", "")

  • The .Replace() method of the .NET [string] type only performs literal (verbatim) substitutions, so an attempt to use a regex cannot work.

  • By contrast, PowerShell's -replace operator, is regex-based. Also note that, unlike the .Replace method, it is case-insensitive by default (though you may use its -creplace variant for case-sensitive replacements).

See this answer for more information and guidance on when to use -replace vs. .Replace().

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thank you, of course it works, but I can see I have a long way ahead of me. At least I've learned here the difference between .Replace() method and -(c)replace operator. Although it is my second post here, I find PowerShell more interesting every time I use it for something useful. My next quest - how to duplicate an entire line in a text file if there is a certain string present in it, but that's what I am going to try figuring out myself ;) – KolesGit Oct 31 '22 at 19:39
  • Glad to hear it, @KolesGit; good luck exploring PowerShell further - it is time worth spending. – mklement0 Oct 31 '22 at 19:43