5

This is a Powershell question similar to what was being asked in this C# question: C# Question Link

I have fixed width column data in a text file that are of variable length, so I'd like to delimit the data with tabs. To do so, I want to use Powershell to read in the file, replace only multiple spaces with tabs using a Regex expression, keep the end of lines intact and output it to a temp file. I'll then rename it to the original.

I've searched the web and only seem to be able to find bits and pieces. Any assistance with this would be greatly appreciated!

Community
  • 1
  • 1
Lawrence Knowlton
  • 99
  • 1
  • 3
  • 10

2 Answers2

7
  1. Fetch the contents

    $content = [IO.File]::ReadAllText('foo.txt')
    
  2. Replace at least two spaces by a single tab:

    $content = $content -replace ' {2,}', "`t"
    
  3. Write back to a file

    [IO.File]::WriteAllText('footab.txt', $contents)
    
Joey
  • 344,408
  • 85
  • 689
  • 683
  • First, thank you! That was wicked fast! Also, what does the $utf8 statement add to this and is it necessary on a plain ole ASCII file? Thank you again! – Lawrence Knowlton Jun 10 '13 at 14:01
  • It just specifies that the output should be written as UTF-8 without a BOM. You can omit it if it's just ASCII, I think. In fact, without the encoding parameter the method does exactly that, so it's unnecessary in either case. – Joey Jun 10 '13 at 14:05
  • Awesome! Now I'm going to incorporate this into a VBA shell call, so I don't have to leave my Excel macro :) Doing this in VBA would be many more lines of code! Thanks again! – Lawrence Knowlton Jun 10 '13 at 14:10
  • I thought it was working, but the footab file is still empty. Any ideas? – Lawrence Knowlton Jun 12 '13 at 01:19
  • Try a complete path to the input and output file. – Joey Jun 12 '13 at 08:16
  • They were complete paths to both. The file is created, but empty. The first time it was zero length, then I changed to single quotes on the paths and the file was the right size, but still empty. – Lawrence Knowlton Jun 12 '13 at 13:48
  • It's probably moot now anyway, now that you accepted the other answer and apparently my attempt to care for line endings was futile. – Joey Jun 12 '13 at 22:20
  • I really appreciate your attempts, they seem like they would be quite robust and a better way to do it. If nothing else more advanced. If I can, I'll see what others think of your methods and if I find suggestions as to how to make it work, I'll let you know! – Lawrence Knowlton Jun 13 '13 at 01:05
3

try

gc .\0.txt  | 
 % { $_ -replace '  +',"`t" } |   
     set-content .\temp.txt
CB.
  • 58,865
  • 9
  • 159
  • 159
  • @AnsgarWiechers Edited with 2 ' ' space char ;) – CB. Jun 10 '13 at 13:06
  • Are you saying that single spaces would be replaced? If so, that is not what I need. The data have single spaces in them, each column always has 2 or more spaces between them. Thank you! – Lawrence Knowlton Jun 10 '13 at 14:23
  • @LawrenceKnowlton NO, single spaces aren't replaced. – CB. Jun 10 '13 at 14:25
  • Cool! I'll try to decipher what you have there. I know you are using the alias for GetContent, but after that the syntax is ehm taxing lol! Thanks! – Lawrence Knowlton Jun 10 '13 at 14:31
  • @LawrenceKnowlton `+` in regular expressions means "the previous expression one or more times". In this case the previous "expression" is just a single space, and this expression is preceeded by another space. Thus the whole expression translates to "a space followed by one or more spaces" (or "2 or more spaces" for short). – Ansgar Wiechers Jun 10 '13 at 17:52
  • That part I understood, but the rest I'm not sure of. – Lawrence Knowlton Jun 10 '13 at 23:23
  • Your method works. Now I just need to get a parameter to work. Thank you! – Lawrence Knowlton Jun 12 '13 at 01:13
  • For powershell newbies like me, `gc` is short for `Get-Content`, there is a similar `sc` for `Set-Content`. `%` is short for `Foreach-Object` and `{ $_ -replace ' +',"\`t" }` is an anonymous function that replaces 2 or more spaces with a tab. – icc97 Oct 23 '17 at 14:00