1

I'm having an issue when I use PowerShell to copy data from an XML sheet into an Excel sheet. The data from the XML file looks like this: <TrackTitle>I'm the One</TrackTitle>, but the result in Excel looks like this: IÂ’'m the One. Anyone know what might be happening with the apostrophe? All I'm doing is grabbing that TrackTitle node and inserting it into an Excel spreadsheet, nothing fancy. I think it's an encoding issue? But I don't know how to affect the encoding of the information as I move it. Thanks a bunch for any help.

Edit: Here's the hex code of the passage in question. In the HxD hex editor, the funky characters show up as well.

In the XML document in HxD,

"IÂ’'m the One" =  "49 C2 92 27 6D 20 74 68 65 20 4F 6E 65"

If I use Edit → Copy as → Editor view, this comes up when I select IÂ’'m the One:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00004AA0                    49 C2 92 27 6D 20 74 68 65 20        IÂ’'m the 
00004AB0  4F 6E 65                                         One

It's strange that the characters get distorted even in HxD. In the XML it looks completely normal.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
Matthew Goode
  • 105
  • 1
  • 9
  • Sure looks like an encoding issue. Try opening the file in some hex editor to see the actual bytes in the file. – Ansgar Wiechers Apr 18 '16 at 21:11
  • Can you break that down a bit more? I forgot to mention I'm pretty new to this, I've been teaching myself for a month or so. Is there possibly a simple online hex editor I could use, just to examine the problem? Or will I need to download something? And assuming it is some kind of encoding problem, where can I go from there? Thanks for responding! – Matthew Goode Apr 18 '16 at 21:13
  • [HxD](https://mh-nexus.de/en/hxd/) is a decent hex editor for Windows. The author provides both an installable and a portable version (scroll down the download page for the latter). Open the file in HxD and find the passage that's giving you trouble. Edit your question and copy/paste both hex and text view (*Edit > Copy as > Editor view*) of that passage as well as the first line of the file into your question. – Ansgar Wiechers Apr 18 '16 at 21:21
  • Thank you! Before I do that, I noticed that if I call the xml element right in the PowerShell console, the funky characters appear. So it's not a problem of moving from XML to PowerShell to Excel, it's a problem in Powershell itself it seems. Going to edit the question now with the hex. – Matthew Goode Apr 18 '16 at 21:35

1 Answers1

2

The two bytes C2 92 right before the single quote (27) are a non-printable control character (U+0092). Not sure what the purpose of this Unicode character is, or how the character got into your XML data (if I had to guess I'd say it was copy/pasted from somewhere, perhaps some HTML text).

If you open the file in Notepad and position the cursor right of the single quote in I' you most likely need to press 3 times to move the cursor from the right side of the ' to the left side of the I.

Simply remove the character from the XML file (delete the faulty character sequence, type I' in its place, then save the file) and you'll be fine.

Community
  • 1
  • 1
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Hmmm well this has been happening with other XML sheets I'm accessing as well. Is there a way to correct this in Powershell automatically? – Matthew Goode Apr 19 '16 at 15:23
  • I just checked a different XML sheet and tried the same thing - it seems to happen any time there is an uncommon character. It replaces the character with a bunch of seemingly unrelated strange characters. This "É" turned into "É". Any idea how to correct this without changing every XML sheet? I found [this](http://www.i18nqa.com/debug/utf8-debug.html) which definitely accounts for my problem, but I don't know anything about encoding to fix it in Powershell. – Matthew Goode Apr 19 '16 at 15:42
  • You should be able to remove the spurious control character with something like `-creplace "I.{1,2}'m", "I'm"`. There's no magic spell for fixing arbitrary character/encoding issues, though. If it's just an encoding problem you might be able to fix it by specifying the correct enconding when reading the file (e.g. `-Encoding UTF8`). – Ansgar Wiechers Apr 19 '16 at 15:53
  • Ohhhh wow yeah that fixed it. I'm self-teaching so there are plenty of noob problems for me to overcome. Thank you so much for that advice. This website is really amazing. – Matthew Goode Apr 19 '16 at 16:01