I'm trying to figure out why the same source data gives me two different output strings depending on the method I use to get it.
I have two CSV files containing data from QuickBooks. One was created using QuickBooks' built-in reporting functionality and the other was created by using a data-access API that uses the QuickBooks SDK. In both of these CSV files, there is a text column which I should be able to use as a key to relate the data in said files.
However, there is one particular character in one particular line that the two files can't seem to agree on:
- In QuickBooks, the character has the visual appearance of a dash
- In the CSV created directly by QuickBooks, the character is exported as an en-dash (U+2013 or decimal code 8211)
- BUT the SDK-base API reads it from QuickBooks as the "Start of Guarded Area" character (U+0096 or decimal code 150).
This causes a problem because my code thinks the two strings are different (which they technically are, but shouldn't be) and therefore fails to match them. I'm convinced there must be some kind of encoding error somewhere along the line, but I can't find any link between the two characters.
I don't expect someone to be able to figure out exactly what's going on, since we don't have access to what QuickBooks or the API are doing behind-the-scenes. But I'm hoping someone can give me some idea as to why this character is being mis-translated.