Questions tagged [multibyte-functions]

41 questions
39
votes
4 answers

Windows API: ANSI and Wide-Character Strings -- Is it UTF8 or ASCII? UTF-16 or UCS-2 LE?

I'm not quite pro with encodings, but here's what I think I know (though it may be wrong): ASCII is a 7-bit, fixed-length encoding, with the characters you can find in ASCII charts. UTF8 is an 8-bit, variable-length encoding. All characters can be…
user541686
  • 205,094
  • 128
  • 528
  • 886
33
votes
4 answers

mb_convert_encoding, undefined function while mbstring is enabled

I have a server (Ubuntu 11.10 x64) running PHP 5.3.8 with Apache2 / MySQL. I'm currently working on a project where I'm required to do some specific character encoding, but I found out that none of the multibyte (mb_* functions) are…
Harold
  • 1,372
  • 1
  • 14
  • 25
26
votes
4 answers

php sprintf() with foreign characters?

Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary? I want the following lines to be aligned correctly…
Mille
  • 713
  • 2
  • 8
  • 10
13
votes
4 answers

PHP Multi Byte str_replace?

I'm trying to do accented character replacement in PHP but get funky results, my guess being because i'm using a UTF-8 string and str_replace can't properly handle multi-byte strings.. $accents_search =…
Ian
  • 24,116
  • 22
  • 58
  • 96
12
votes
2 answers

multi-byte function to replace preg_match_all?

I'm looking for a multi-byte function to replace preg_match_all(). I need one that will give me an array of matched strings, like the $matches argument from preg_match(). The function mb_ereg_match() doesn't seem to do it -- it only gives me a…
user151841
  • 17,377
  • 29
  • 109
  • 171
8
votes
3 answers

multi-byte characters in libc regcomp and regexec

Is there anyway to get libc6's regexp functions regcomp and regexec to work properly with multi-byte characters? For instance, if my pattern is the utf8 characters 猫机+猫, finding a match on the utf8 encoded string 猫机机机猫 will fail, where it should…
bill_e
  • 930
  • 2
  • 12
  • 24
7
votes
2 answers

Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?

Following my previous question: Why `strchr` seems to work with multibyte characters, despite man page disclaimer?, I figured out that strchr was a bad choice. Instead I am thinking about using strstr to look for a single character (multi-byte not…
n0p
  • 3,399
  • 2
  • 29
  • 50
6
votes
8 answers

Using UTF-8 charset with PHP - are mb functions required?

These past few days I've been working toward converting my PHP code base from latin1 to UTF-8. I've read the two main solutions are to either replace the single byte functions with the built in multibyte functions, or set the mbstring.func_overload…
Spoonface
  • 1,513
  • 1
  • 20
  • 29
5
votes
2 answers

How to handle multibyte string in Python

There are multibyte string functions in PHP to handle multibyte string (e.g:CJK script). For example, I want to count how many letters in a multi bytes string by using len function in python, but it return an inaccurate result (i.e number of bytes…
hungneox
  • 9,333
  • 12
  • 49
  • 66
5
votes
3 answers

PHP method for stripping duplicate chars from a multibyte string?

Arrrgh. Does anyone know how to create a function that's the multibyte character equivalent of the PHP count_chars($string, 3) command? Such that it will return a list of ONLY ONE INSTANCE of each unique character. If that was English and we…
Dave
  • 117
  • 5
4
votes
2 answers

PHP multi-byte alternatives UTF8

I've been searching for UTF8-safe alternatives for string manipulation functions. I've found many different opinions and suggestions. I would like to ask if following functions can cause problems in UTF-8 and if does, what should I use instead. I…
sczdavos
  • 2,035
  • 11
  • 37
  • 71
3
votes
3 answers

How to get correct list position in multi-byte string using preg_match

I am currently matching HTML using this code: preg_match('/<\/?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;/u', $html, $match, PREG_OFFSET_CAPTURE, $position) It matches everything perfect, however if I have a multibyte character, it counts it as 2 characters…
Dave Stein
  • 8,653
  • 13
  • 56
  • 104
3
votes
1 answer

How to properly use MultiByteToWideChar

I am using MultiByteToWideChar to convert my string to a wstring. I am first trying to get the required size for my wstring. According to the documentation passing 0 as the last argument should accomplish this. Using MultiByteToWideChar(CP_UTF8,…
RagHaven
  • 4,156
  • 21
  • 72
  • 113
3
votes
2 answers

Combine two Bytes to WideChar

Is it possible to combine two Bytes to WideChar and if yes, then how? For example, letter "ē" in binary is 00010011 = 19 and 00000001 = 1, or 275 together. var WChar: WideChar; begin WChar := WideChar(275); // Result is "ē" var B1, B2:…
Little Helper
  • 2,419
  • 9
  • 37
  • 67
2
votes
2 answers

Character Encoding UTF8 Issue when using mb_detect_encoding() with PHP

I am reading an rss feed http://beersandbeans.com/feed/ The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following:
Lizard
  • 43,732
  • 39
  • 106
  • 167
1
2 3