Questions tagged [multibyte-functions]
41 questions
39
votes
4 answers
Windows API: ANSI and Wide-Character Strings -- Is it UTF8 or ASCII? UTF-16 or UCS-2 LE?
I'm not quite pro with encodings, but here's what I think I know (though it may be wrong):
ASCII is a 7-bit, fixed-length encoding, with the characters you can find in ASCII charts.
UTF8 is an 8-bit, variable-length encoding. All characters can be…

user541686
- 205,094
- 128
- 528
- 886
33
votes
4 answers
mb_convert_encoding, undefined function while mbstring is enabled
I have a server (Ubuntu 11.10 x64) running PHP 5.3.8 with Apache2 / MySQL. I'm currently working on a project where I'm required to do some specific character encoding, but I found out that none of the multibyte (mb_* functions) are…

Harold
- 1,372
- 1
- 14
- 25
26
votes
4 answers
php sprintf() with foreign characters?
Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?
I want the following lines to be aligned correctly…

Mille
- 713
- 2
- 8
- 10
13
votes
4 answers
PHP Multi Byte str_replace?
I'm trying to do accented character replacement in PHP but get funky results, my guess being because i'm using a UTF-8 string and str_replace can't properly handle multi-byte strings..
$accents_search =…

Ian
- 24,116
- 22
- 58
- 96
12
votes
2 answers
multi-byte function to replace preg_match_all?
I'm looking for a multi-byte function to replace preg_match_all(). I need one that will give me an array of matched strings, like the $matches argument from preg_match(). The function mb_ereg_match() doesn't seem to do it -- it only gives me a…

user151841
- 17,377
- 29
- 109
- 171
8
votes
3 answers
multi-byte characters in libc regcomp and regexec
Is there anyway to get libc6's regexp functions regcomp and regexec to work properly with multi-byte characters?
For instance, if my pattern is the utf8 characters 猫机+猫, finding a match on the utf8 encoded string 猫机机机猫 will fail, where it should…

bill_e
- 930
- 2
- 12
- 24
7
votes
2 answers
Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?
Following my previous question: Why `strchr` seems to work with multibyte characters, despite man page disclaimer?, I figured out that strchr was a bad choice.
Instead I am thinking about using strstr to look for a single character (multi-byte not…

n0p
- 3,399
- 2
- 29
- 50
6
votes
8 answers
Using UTF-8 charset with PHP - are mb functions required?
These past few days I've been working toward converting my PHP code base from latin1 to UTF-8. I've read the two main solutions are to either replace the single byte functions with the built in multibyte functions, or set the mbstring.func_overload…

Spoonface
- 1,513
- 1
- 20
- 29
5
votes
2 answers
How to handle multibyte string in Python
There are multibyte string functions in PHP to handle multibyte string (e.g:CJK script). For example, I want to count how many letters in a multi bytes string by using len function in python, but it return an inaccurate result (i.e number of bytes…

hungneox
- 9,333
- 12
- 49
- 66
5
votes
3 answers
PHP method for stripping duplicate chars from a multibyte string?
Arrrgh. Does anyone know how to create a function that's the multibyte character equivalent of the PHP count_chars($string, 3) command?
Such that it will return a list of ONLY ONE INSTANCE of each unique character. If that was English and we…

Dave
- 117
- 5
4
votes
2 answers
PHP multi-byte alternatives UTF8
I've been searching for UTF8-safe alternatives for string manipulation functions. I've found many different opinions and suggestions. I would like to ask if following functions can cause problems in UTF-8 and if does, what should I use instead. I…

sczdavos
- 2,035
- 11
- 37
- 71
3
votes
3 answers
How to get correct list position in multi-byte string using preg_match
I am currently matching HTML using this code:
preg_match('/<\/?([a-z]+)[^>]*>|?[a-zA-Z0-9]+;/u', $html, $match, PREG_OFFSET_CAPTURE, $position)
It matches everything perfect, however if I have a multibyte character, it counts it as 2 characters…

Dave Stein
- 8,653
- 13
- 56
- 104
3
votes
1 answer
How to properly use MultiByteToWideChar
I am using MultiByteToWideChar to convert my string to a wstring. I am first trying to get the required size for my wstring. According to the documentation passing 0 as the last argument should accomplish this. Using MultiByteToWideChar(CP_UTF8,…

RagHaven
- 4,156
- 21
- 72
- 113
3
votes
2 answers
Combine two Bytes to WideChar
Is it possible to combine two Bytes to WideChar and if yes, then how?
For example, letter "ē" in binary is 00010011 = 19 and 00000001 = 1, or 275 together.
var
WChar: WideChar;
begin
WChar := WideChar(275); // Result is "ē"
var
B1, B2:…

Little Helper
- 2,419
- 9
- 37
- 67
2
votes
2 answers
Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP
I am reading an rss feed http://beersandbeans.com/feed/
The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following:

Lizard
- 43,732
- 39
- 106
- 167