4

I have a function that performs a regex replace in a file. The problem is that that it adds a character (0x00) to the start of every file it touches (even the ones that it doesn't find a match for!). Since I am editing csproj files, MSBuild gives me this error:

error MSB4025: The project file could not be loaded. '.', hexadecimal value 0x00, is an invalid character. Line 2, position 1.

Here is my function:

function fileStringRegExReplace ([string] $fileToChange, [string] $oldString, [string] $newString) {
    echo "f" | xcopy "$fileToChange" "$fileToChange.og.cs" /Y /Q

    $file = Get-Content "$fileToChange.og.cs" | 
        Foreach-Object {
            $_ -replace $oldString, $newString
        } |
        Out-File "$fileToChange"

    Remove-Item "$fileToChange.og.cs"
}

How can I replace the lines I want and not change any other part of the file?

Jeremiah
  • 108
  • 2
  • 6

4 Answers4

6

It sounds like it's writing a BOM at the beginning of the file. You can set the encoding to ASCII (which has no BOM) using the -Encoding ASCII parameter on out-file.

Nate Hekman
  • 6,507
  • 27
  • 30
  • for reasons unbeknownst to me `Set-Content` makes ASCII files by default and `Out-File` creates UCS-2 little endian files. [related](http://stackoverflow.com/questions/10655788/powershell-set-content-and-out-file-what-is-the-difference). I created a [`Get-FileEncoding`](http://stackoverflow.com/questions/9121579/powershell-out-file-prevent-encoding-changes) function that attempts to determine what the source file encoding is so that it can be used with the `-Encoding` parameter with `Set-Content` and `Out-File` which is useful for file updating like this. – Andy Arismendi Apr 15 '13 at 22:27
  • I'm still also getting the issue after changing to Set-Content or using Out-File -Encoding ASCII. I'm also replacing text in the csproj file. – sonjz Mar 03 '15 at 05:18
  • have a solution, posting now. – sonjz Mar 03 '15 at 17:01
2

The default encoding of Out-File is Unicode, which is Windows-speak for UTF-16. When only writing characters from the ASCII set, UTF-16 basically has the effect of adding a 0x00 byte in front of each character. This explains why visual studio is complaining about 0x00 bytes.

The XML of the csproj files which you are trying to modify declare themselves to be UTF-8, so use the -Encoding UTF8 option in Out-File.

Do not use the ASCII encoding, this will cause problems as soon as the csproj file gets a non-ASCII character in it.

Wim Coenen
  • 66,094
  • 13
  • 157
  • 251
1

I was having the same issue, after using a ForEach to replace the text, I ran into issues.

For my solution, I just wanted to find the last </Target> and add append another <Target></Target>.

I tried the approach and the file size doubled for some reason, and failing on the 0x00 error at Line: 2, Position: 1 as well.

I must credit @Matt on this solution, as I probably wouldn't have figured out the regex on my own: https://stackoverflow.com/a/28437855/740575

This allowed me to elegantly not use the ForEach approach. You should find your answer somewhere in this solution.

$replaceVar = "<Target> ... </Target" ;
# NOTE: -Raw will read the entire file in as a string, without doing that
#       everything gets read in as an array of lines
$file = Get-Content file.csproj -Raw ;
$newFile = $file -replace "(?s)(.*)</Target>(.*)", "$1$replaceVar$2" ;

# csproj is UTF8
$newFile | Out-File -Encoding UTF8 "new.csproj" ;

Solution works within Visual Studio and msbuild.exe.

Community
  • 1
  • 1
sonjz
  • 4,870
  • 3
  • 42
  • 60
0

Try replacing out-file with set-content.

mjolinor
  • 66,130
  • 7
  • 114
  • 135