0

I'm trying to compare these two strings Cpt. Awesome â\u0084¢ and Cpt. Awesome ™ they are essentially the same though in one string the trademark character is differently encoded. I'd like to know how I can encode them so that they would become equal.

I tried re-encoding them both in the same encoding but the result was still the same.

Feanaro
  • 922
  • 3
  • 19
  • 35
  • You've got them as *strings*. There's no encoding left to do. If you've got the original *binary* data, you're in a far better situation - basically your first string is the result of decoding the binary data using the wrong encoding. You should really, really try to fix the problem *before* it gets to that point. – Jon Skeet Jul 17 '15 at 18:00
  • I thought as much :/ the problem is that the first result is from a third-party library. So there not a lot I can change. – Feanaro Jul 17 '15 at 18:07

2 Answers2

1
byte[] bytes = Encoding.Default.GetBytes(myString);
myString = Encoding.UTF8.GetString(bytes);

EDIT

remove Non-ASCII Chars

s1 = Regex.Replace(s1, @"[^\u0000-\u007F]", string.Empty);
s2 = Regex.Replace(s2, @"[^\u0000-\u007F]", string.Empty);
drooksy
  • 284
  • 2
  • 8
1

Firstly, this may be a post worth looking at. As Jon Skeet stated, strings don't have encoding, and encoding only comes into play when going to or from byte arrays. If those two strings actually represent the data available, you may have to look into doing a lookup where you store a dictionary to map â\u0084¢ to , because just encoding and decoding won't give you what you are looking for.

Community
  • 1
  • 1
GEEF
  • 1,175
  • 5
  • 17