Questions tagged [mojibake]

Garbled text that is the result of bytes being decoded using an incorrect coding.

Mojibake is the phenomenon which occurs when text is decoded from a byte stream using the wrong character encoding, resulting in a sequence of characters which is unreadable. The term "mojibake" is derived from Japanese where it literally means "unintelligible sequence of characters".

Example mojibake:

اÙ"إعÙ"ان اÙ"عاÙ"Ù

References:

150 questions
187
votes
12 answers

"’" showing on page instead of " ' "

’ is showing on my page instead of '. I have the Content-Type set to UTF-8 in both my tag and my HTTP headers: In addition, my browser is set to Unicode (UTF-8): So…
Jitendra Vyas
  • 148,487
  • 229
  • 573
  • 852
169
votes
23 answers

How do I remove  from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it:  PHP removes all whitespace, so a random  in the middle of…
Matt
  • 11,157
  • 26
  • 81
  • 110
67
votes
16 answers

Getting ’ instead of an apostrophe(') in PHP

I've tried converting the text to or from utf8, which didn't seem to help. I'm getting: "It’s Getting the Best of Me" It should be: "It’s Getting the Best of Me" I'm getting this data from this url.
Mint
  • 14,388
  • 30
  • 76
  • 108
61
votes
4 answers

How to convert these strange characters? (ë, Ã, ì, ù, Ã)

My page often shows things like ë, Ã, ì, ù, à in place of normal characters. I use utf8 for header page and MySQL encode. How does this happen?
Leonardo
  • 2,273
  • 6
  • 29
  • 32
53
votes
9 answers

Facebook JSON badly encoded

I downloaded my Facebook messenger data (in your Facebook account, go to settings, then to Your Facebook information, then Download your information, then create a file with at least the Messages box checked) to do some cool statistics However there…
Jakub Jendryka
  • 599
  • 1
  • 4
  • 6
36
votes
10 answers

How to replace � in a string

I have a string that contains a character � I haven't been able to replace it correctly. String.replace("�", ""); doesn't work, does anyone know how to remove/replace the � in the string?
Thizzer
  • 16,153
  • 28
  • 98
  • 139
23
votes
5 answers

Converting special charactes such as ü and à back to their original, latin alphbet counterparts in C#

I have been given an export from a MySQL database that seems to have had it's encoding muddled somewhat over time and contains a mix of HTML char codes such as & uuml; and more problematic characters representing the same letters such as ü and Ã.…
Gga
  • 4,311
  • 14
  • 39
  • 74
17
votes
1 answer

Unicode input retrieved via PrimeFaces input components become corrupted

When I was still using PrimeFaces v2.2.1, I was able to type unicode input such as Chinese with a PrimeFaces input component such as and , and retrieve the input in good shape in managed bean method. However, after I…
Mr.J4mes
  • 9,168
  • 9
  • 48
  • 90
17
votes
5 answers

How to pass Unicode characters as JSP/Servlet request.getParameter?

After a lot of trial and error I still can't figure out the problem. The JSP, servlet, and database are all set to accept UTF-8 encoding, but even still whenever I use request.getParameter on anything that has any two-byte characters like the em…
user707053
16
votes
5 answers

$_POST will convert from utf-8 to ä ö ü etc

I am new here, so I apologize if I am doing anything wrong. I have a form which submits user input onto another page. User is expected to type ä, ö, é, etc... I have placed all of the following in the document:
lungov
  • 193
  • 1
  • 1
  • 8
13
votes
5 answers

nodejs synchronization read large file line by line?

I have a large file (utf8). I know fs.createReadStream can create stream to read a large file, but not synchronized. So i try to use fs.readSync, but read text is broken like "迈�". var fs = require('fs'); var util = require('util'); var textPath =…
nroe
  • 163
  • 1
  • 1
  • 6
13
votes
3 answers

Python correct encoding of Website (Beautiful Soup)

I am trying to load a html-page and output the text, even though i am getting the webpage correctly, BeautifulSoup destroys somehow the encoding. Source: # -*- coding: utf-8 -*- import requests from BeautifulSoup import BeautifulSoup url =…
user1767754
  • 23,311
  • 18
  • 141
  • 164
11
votes
1 answer

Convert unicode with utf-8 string as content to str

I'm using pyquery to parse a page: dom = PyQuery('http://zh.wikipedia.org/w/index.php', {'title': 'CSS', 'printable': 'yes', 'variant': 'zh-cn'}) content = dom('#mw-content-text > p').eq(0).text() but what I get in content is a unicode string with…
wong2
  • 34,358
  • 48
  • 134
  • 179
10
votes
2 answers

In what world would \\u00c3\\u00a9 become é?

I have a likely improperly encoded json document from a source I do not control, which contains the following strings: d\u00c3\u00a9cor business\u00e2\u20ac\u2122 active accounts the \u00e2\u20ac\u0153Made in the USA\u00e2\u20ac\u009d label From…
Kevin Dolan
  • 4,952
  • 3
  • 35
  • 47
8
votes
1 answer

Chinese characters instead of non-latin characters (Mojibake bug?!)

Using react-native v0.31.0 on iOS (currently on iOS9 and iOS10). I have a non-latin text inside a Component and sometimes I see it like this: But it actually should look like this: My walk-around is: Sniffed the network, and the data looks…
gran33
  • 12,421
  • 9
  • 48
  • 76
1
2 3
9 10