6

I'm querying the MediaWiki API to get Wikipedia data into my Filemaker database. When I load the data into a browser, the characters show up properly but when it comes into Filemaker, characters with diacriticals get converted to these odd characters: á is converted to √° (square root symbol + degree symbol), é is converted to √© (square root symbol + copyright symbol), í is converted to √≠ (square root symbol + not equals symbol) and more. What character encoding is that? Thank you!!

sombreptile
  • 87
  • 2
  • 4
  • 2
    Looks like UTF-8 misinterpreted as Mac-Roman. – Joni Mar 07 '13 at 22:43
  • Take a look at the raw bytes for that character that you see in MediaWiki and compare those to what are in Filemaker. Also, check out this article which talks about different ways of writing the character `é`: https://dev.twitter.com/docs/counting-characters#Definition_of_a_Character – Chris Haas Mar 07 '13 at 22:54

3 Answers3

5

As @Joni suggests in his comment, this is UTF-8 misinterpreted as MacRoman. Letter á is C3 A1 (hex.) in UTF-8, and C3 is “√” in MacRoman, A1 is “°”. So you should just try to set the program to interpret the data as UTF-8.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
1

I'm sure this isn't the full list, but it did what I needed. Here is a lookup for the codes:

√© é e

√° á a

√≠ í i

√≥ ó o

√∂ ö o

√º ü u

√¥ ô o

√® è e

√ß ç c

√± ñ n

√∏ ø o

√´ ë e

√§ ä a

√• å a

√Å Á A

√∫ ú u

√ª û u

√Ø ï i

√â É E

√† à a

√¶ æ ae

√Æ î i

√¢ â a

√£ ã a

√î Ô O

√ü ß ss

√ì Ó O

√≤ ò o

√Ω ý y

√ñ Ö O

√™ ê e

√Ä À A

√ò Ø O

√Ö Å A

√∞ ð eth

√á Ç C

√Ç Â A

√π ù u

√í Ò O

√¨ ì i

√ú Ü U

√à È E

√û Þ Th

Benjamin Grout
  • 51
  • 1
  • 10
  • Do you happen to know what ≈° means? All the remainers in my data do follow your list just this one I cannot decipher. Thanks so much! – canIchangethis Oct 10 '22 at 12:59
0

You're all correct about the misinterpreted characters, the Troi URL FMP plugin I was using to set FMP's user agent (as MediaWiki API requires) was responsible for pulling in the garbled characters. Solution was to bypass the plugin: FMP script performs Applescript "do shell script curl -A" to set user agent and query API and pull response back into FMP and all characters come through properly!

sombreptile
  • 87
  • 2
  • 4