6

I am having issues editing a file. It is a .ARQ file made by BMC Remedy, a ticketing system.

I can open it in notepad++, edit it and everything is fine. However when I try to use PowerShell to edit it, things get messed up. Although visually it looks the same, the application doesn't read it the same way. Here are some tests I have done to try and figure out what is wrong.

Test 1

get-content monitor.arq | set-content monitor2.arq

Result

Length Name
------ ----
  3578 monitor.arq
  3585 monitor2.arq 

Basically they are different sizes, and monitor2.arq does not function like monitor.arq does, visually in notepad++ they are identical

Test 2

I thought maybe it was an encoding issue so I tried this.

$code = @("Unicode", "UTF7", "UTF8", "UTF32", "ASCII", "BigEndianUnicode", "Default", "OEM")
for ($a = 0; $a -lt $code.count; $a++) {
    Get-Content .\monitor.arq | Out-File -Encoding $code[$a] -FilePath ".\monitor$a-$($code[$a]).arq"
}

Result

Length Name                         
------ ----                         
  3578 monitor.arq                  
  7172 monitor0-Unicode.arq         
  4911 monitor1-UTF7.arq            
  3596 monitor2-UTF8.arq            
 14344 monitor3-UTF32.arq           
  3585 monitor4-ASCII.arq           
  7172 monitor5-BigEndianUnicode.arq
  3585 monitor6-Default.arq         
  3585 monitor7-OEM.arq  

Not a single one appears correct, maybe I am barking up the wrong tree here, but there is plain text in this file I want to edit without breaking the whole thing.

I also tried a file stream (lots of code, did not include) but it produced the same results.

According to notepad++ its ANSI

According to a few scripts I found online to check, its ASCII

I am likely doing something completely stupid or don't understand enough about this to get the job done.

Any help would be great.

Test with help from Comments (thanks ladies and gents)

$content1=gc -Encoding byte monitor.arq
[System.Text.Encoding]::ASCII.GetString($content1) | out-file -Encoding ascii -FilePath .\monitor2.arq
$content2=gc -Encoding byte monitor2.arq
Compare-Object $content1 $content2

The result of this shows me what characters are being converted in a way I don't want.

InputObject SideIndicator
----------- -------------
         63 =>           
         63 =>           
         63 =>           
         63 =>           
         13 =>           
         10 =>           
        201 <=           
        233 <=           
        233 <=           
        201 <=           

The 13 and 10 I know are the line breaks discussed in the comments, they are added at the end of the file, I honestly don't think that's affecting anything at this point because it appears after an END statement (this is a macro file for BMC Remedy)

My concern however is that 201 and 233 are being converted to 63. Any thoughts on how to stop that? I am was going to simply convert back to a byte[] before writing the file but at that point I wont be able to tell the difference between 63's that should be 201 and those that should be 203.

SOLVED

So with the help of everyone in the comments I have come to a solution. My own relentless stubbornness to believe it was ASCII was what did me in, coupled with the line breaks mentioned in the comments.

$content1=gc -Encoding byte monitor.arq
[System.Text.Encoding]::default.GetString($content1) | out-file -Encoding default -FilePath .\monitor2.arq
$content2=gc -Encoding byte monitor2.arq
Compare-Object $content1 $content2 | format-table -AutoSize

Not really sure who should post to get the solved credit, but thanks all

Community
  • 1
  • 1
TetraFlash
  • 315
  • 7
  • 18
  • 4
    What you might be seeing is that the original file has single-character linebreaks (a linefeed / newline character) whereas `Set-Content` uses double-character linebreaks (carriage return + linefeed) when it writes the file – Mathias R. Jessen Nov 17 '15 at 18:19
  • Does that apply to Out-File as well? – TetraFlash Nov 17 '15 at 18:31
  • @TetraFlash yes it does. It has to do with the way you are reading the file and subsequently writing it (line by line). You may be able to use `[System.IO.File]::ReadAllText()` and then `::WriteAllText()`. – briantist Nov 17 '15 at 18:40
  • 3
    You might need to be _reading_ the file use `-Encoding` as well. Assuming you knew how it was encoded..... hmmm it is ansi you say... What Mathias said is very likely. Was your monitor file about 7ish lines? – Matt Nov 17 '15 at 18:55
  • ya both in and out files are 7 lines – TetraFlash Nov 17 '15 at 19:08
  • That means what @MathiasR.Jessen is saying is most likely true (as you added 7 characters in your first test) and you could use briantist suggestion if you need to do data manipulation on the string as a single unit. Another suggestion is to consider unix2dos / dos2unix. Those and similar utilities can handle converting the linefeed /carriage return file formats. Used to use them in the past for something similar. – Matt Nov 17 '15 at 19:14
  • [System.IO.File]::WriteAllText("C:\test\monitor2.arq",([System.IO.File]::ReadAllText("C:\test\monitor.arq"))) this still created a larger output file 3594 in this case – TetraFlash Nov 17 '15 at 19:17
  • 1
    You can use code like this to see what's really in the file: `$content=gc -Encoding byte monitor.arq; $filechunk=$content[0..100]; [byte[]][char[]]$filechunk| %{$_.tostring("X")}`. This will show the hexadecimal values for the first 101 bytes. BTW, on a plain ASCII file get-content (w/o -Encoding) returns an array of strings without line termination characters. Piping that array to set-content causes it to add a line termination sequence. _In essence piping gc to sc can convert line terminators_. I'm not sure if any of the encoding values to sc will change the line termination sequence. – Χpẘ Nov 17 '15 at 19:32
  • 1
    @TetraFlash the methods I mentioned use UTF-8 by default. You may have to use overloads that let you specify the encoding. – briantist Nov 17 '15 at 19:49
  • Thanks for all the help, I was able to use this info to find the root cause. I am going to post it in the question so its readable. Any suggestions on how to fix this would be great. – TetraFlash Nov 18 '15 at 13:04
  • So the issue is these are "extended" ASCII characters. – TetraFlash Nov 18 '15 at 14:15

1 Answers1

0
$content1=gc -Encoding byte monitor.arq
[System.Text.Encoding]::default.GetString($content1) | out-file -Encoding default -FilePath .\monitor2.arq
$content2=gc -Encoding byte monitor2.arq
Compare-Object $content1 $content2 | format-table -AutoSize
TetraFlash
  • 315
  • 7
  • 18