4

I have a databse of a russsian website and it's encoded in windows-1251. Another words, the letters look like this from phpmyadmin: Âûõîäÿùàÿ â Ëîíäîíå ãàçåòà íà àðàáñêîì ÿçûêå «Àëü-Õàéÿò» ñîîáùèëà,. Another words illegible characters. In order to display the content properly this code must be added in php.

header("Content-Type: text/html; charset=windows-1251");

I would like to migrate this site an opensource software such as joomla or wordpress without hiccups.

So, in order to do that I need to convert these funny looking characters to utf-8 which will look like this even in phpmyadmin:

Выходящая в Лондоне газета на арабском языке «Аль-Хайят» сообщила,
ZygD
  • 22,092
  • 39
  • 79
  • 102
boruchsiper
  • 2,016
  • 9
  • 29
  • 53

1 Answers1

4

Dump the .sql and use iconv ( linux program ).

iconv -f utf-8 -t latin1 < in.sql | iconv -f cp1251 -t utf-8 > out.sql

I did this earlier this year, How can I convert Cyrillic stored as LATIN1 ( sql ) to true UTF8 Cyrillic with iconv?

If you dont know how to get iconv, and dont have any sensitive information stored in the sql, I can do it for you and send it back to you.

Community
  • 1
  • 1
meder omuraliev
  • 183,342
  • 71
  • 393
  • 434
  • Hi meder, the info is not that sensitive, however the size of sql dump file is 180 mg. I'm going to try your suggestion. The only problem I have is that the original database is stored on a paid hosting account where I don't have access to shell command. I have a linux server for testing purposes. So what do i need to do with the sql dump file once I've exported it via phpmyadmin from the paid hosting account? – boruchsiper Nov 18 '10 at 22:05
  • I get an error 'iconv: illegal input sequence at position 6721' – boruchsiper Nov 18 '10 at 22:37
  • Oh, I forgot.. you need to do a replace for weird characters you run into, replace it with a caps lock version of what it is, like 'THISISACURLYQUOTE' and do a replace on entire document, then after u run iconv, replace 'THISISACURLYQUOTE' with it since it will be UTF8. Do that for every character you run into. – meder omuraliev Nov 18 '10 at 22:38
  • I have no idea how to replace weird characters with a caps lock version. P.S. how do know when someone adds a comment. I refresh the page every few minutes. Is there a notification setting in stackoverflow that I missed? – boruchsiper Nov 18 '10 at 22:42
  • Do you know how to replace "foo" with "bar" in a text file? Same concept pretty much. – meder omuraliev Nov 18 '10 at 22:45
  • I got it. But which weird characters do I need to replace with the caps lock version? – boruchsiper Nov 18 '10 at 22:51
  • I'm sorry for asking a stupid question. How do I find where position 6721 is? – boruchsiper Nov 18 '10 at 23:04
  • it's the 6721st character from the first character in your text file. – meder omuraliev Nov 19 '10 at 03:52
  • meder, thanks to your advice I was able to do something I thought was impossible. I converted a database of 29000 records to utf8. I was searching the internet for days. Actually there were about 5 characters that couldn't be converted so I replaced them with codes that I can remember for later to change them back after conversion. A million thanks to you! – boruchsiper Nov 19 '10 at 07:15