8

I am developing an API using CodeIgniter, and Phils RESTserver. I am trying to send a POST request containing special characters, but the string is not added to the database.

CodeIgniter also says that lastname is required (that it is not present in the string). Why?

I am using this format:

application/x-www-form-urlencoded

This is my string:

firstname=Andrew&lastname=Åsberger

It is very important that I can use special characters for internationalization.

Thankful for all input!

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
Jonathan Clark
  • 19,726
  • 29
  • 111
  • 175
  • 1
    can you try var_dump the firstname ? is it correctly set as Andrew? – ajreal Dec 12 '11 at 03:03
  • How do you pass your string to the rest service? How do you invoke it at all? Please add an example so it's more clear what does not work. Codeigniter can be quite flawed when it comes to input processing as values might get dropped due to buggy UTF-8 validation and the default XSS "protection" (which is broken). – hakre Dec 15 '11 at 16:28

5 Answers5

6

You should URI-encode each name and value. Hopefully the client and server code will both agree that UTF-8 should be used for encoding the octets of characters outside of the US-ASCII range (since earlier URI-encoding standards weren't specific and there is legacy code out there that tries other encodings), so your example becomes:

firstname=Andrew&lastname=%C3%85sberger

Just like it would in the query portion of a URI used with a GET.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • Unfortunately %C3%85sberger is not working either. Codeigniter form validation says that lastname is empty... I cannot understand why. – Jonathan Clark Dec 09 '11 at 13:01
  • I assume that it is working correctly for `firstname=Andrew&lastname=test`? How does it work with special characters like `&` encoded as `%26` rather than normal characters like `Å`? Also, does it accept `Å` encoded as `%C5`? (Hopefully not, as then you're in trouble for anything over U+00FF). – Jon Hanna Dec 09 '11 at 13:55
  • Lastname test works fine and if I set lastname to %26 it works too. How ever %C5 does not work. – Jonathan Clark Dec 09 '11 at 14:14
  • So ironically special characters are fine! Codeigniter is PHP based, isn't it? What do `mb_internal_encoding` and `mb_http_input('P')` return? Perhaps setting that `mb_internal_encoding('UTF-8')` and then trying my original answer would work. – Jon Hanna Dec 09 '11 at 14:29
  • 1
    No, special characters like Å (%C5) does not work, but & (%26) did. I cannot understand why. Yes, Codeigniter is written on PHP. Where can I add "mb_internal_encoding('UTF-8')"? Thanks! – Jonathan Clark Dec 09 '11 at 14:32
  • mb_internal_encoding() returns UTF-8. – Jonathan Clark Dec 09 '11 at 14:35
  • & is a special character because it means something to application/x-www-form-urlencoded, but Å is just Å. Try a test .php page in the application first to see what `mb_internal_encoding` and `mb_internal_encoding('P')` return (if they return UTF-8 already, then this idea is a bust). If they return something else, try putting it in a PHP include that is used in all or most of the pages. After that, I know REST, I know application/x-www-form-urlencoded, I know Internationalised URI-encoding and I know character encodings, but I don't know code-igniter so I'll be out of useful knowledge :( – Jon Hanna Dec 09 '11 at 14:44
  • Just saw your last comment. I did check that the above is the correct application/x-www-form-urlencoded for UTF-8 based, and it is, so it definitely seems that this is down to the part of the puzzle that I know nothing about :( – Jon Hanna Dec 09 '11 at 14:45
  • Oh, that is sad. I am totally stuck on this one. But I am very thankful for your help so far! – Jonathan Clark Dec 09 '11 at 15:05
  • That would be super. If I cannot solve this issue I have a big problem. Using special characters like åäö is a must for my system. I cannot understand why this should be so hard. A lot of people is using it I hope :) – Jonathan Clark Dec 10 '11 at 00:24
  • Really, any system should be able to deal with such characters if they are dealing with human readable text (why I object to them being called "special", they're no more special than "abc") I've got tonnes of these point things here, so I'll add a bounty when the question becomes eligible for one. I'd really like to see this resolved. Have you tried code-igniter specific forums? – Jon Hanna Dec 10 '11 at 09:51
  • Yes, I have a thread there too. Hopefully I can get this resolved. – Jonathan Clark Dec 10 '11 at 11:21
  • It would be really helpful to see the request and response headers for the post. If you post from, for example let's say a browser, it will automatically encode your 'Å' to '%C5', and in turn the assumption also is, that given the right content-type header, the receiving end can decode this into the charset specified in the "post" request. Obviously either end is intervening on misinterpreting. WHERE or WHAT from are you sending the request? Can you do soemthing like print_r($_SERVER); on the php handling the post-request? – kontur Dec 16 '11 at 12:30
  • Sorry, but i could not understand the question enough. Do you mean, you fill a form, submit it and form validation does not accept because it does not find any string in surname field. Am i correct? You should explain it a little bit. – Murat Ünal Dec 17 '11 at 04:03
  • No, you missunderstood me there. What I am saying is that every POST request on the internet sends headers. Browsing a webpage, submitting a form on it, everything communicated via http headers. It is in these headers that the client and the server "talk" to each other, saying "I accept stuff encoded in ABC" or responding saying "T send you stuff in encoding ABC". Getting a printout of these headers could help to make sure the problem is not there. The $_SERVER variable holds information like that for the requested php script. – kontur Dec 17 '11 at 13:46
  • Sorry kontur. Thank you very much for your explanation by the way.But the first comment at top(commented by Jonathan Clark) atracts my attention. – Murat Ünal Dec 17 '11 at 23:40
  • I think what kontur wants to see is something like below. Am i correct? https://github.com/philsturgeon/codeigniter-restserver/blob/master/application/controllers/api/example.php – Murat Ünal Dec 17 '11 at 23:47
3

It seems like you are having an encoding issue. You need to make sure that you are using UTF8 from end to end: client (browser), server (PHP), db connection and db. I assume your db table(s) are already UTF8, but what many forget is the connection to the database. Right after you connect to the database, you should run the "query" SET NAMES UTF8. Not sure if CodeIgniter uses the db connection to escape characters.

I don't use CodeIgniter, but if it's not using the proper encoding, then double-byte characters get expanded out into 2 characters. For example, if you running urlencode('Å') returns %C3%85, not %C5. This is actually a SQL injection method. If one of the characters it "decodes" to is a ' or ", then there is a quoting issue/vulnerability. This could cause CodeIgniter to evaluate the string incorrectly.

Finally, are you doing your POST through javascript? Javascript does not support UTF8 encoding, so it causes some problems depending on how you POST. You can use javascript to POST a html form, but you can run into problems when you try to do an ajax post using strings you make yourself. Although unescape( encodeURIComponent( s ) ) supposedly works.

Brent Baisley
  • 12,641
  • 2
  • 26
  • 39
  • I am using MongoDB as the database. Does that change anything? – Jonathan Clark Dec 12 '11 at 07:51
  • sort of, it means you can ignore my 'SET NAME UTF8' advice. – Brent Baisley Dec 12 '11 at 11:23
  • 1
    Database being a concern seems unlikely, since the code isn't getting that far. Javascript can't be the ultimate cause, since the OP tried sending %C3%85 directly. However, I wonder if you are onto something when my mention SQL injection methods. Could it be an over-agressive attempt to block overlong UTF-8 based attacks that blocks valid characters. – Jon Hanna Dec 12 '11 at 18:17
  • The database would only come into play if using mysql_real_escape_string. Å is a double-byte character, if everything else goes through fine, that would be the investigation route I would take. Not sure if this support thread helps. http://codeigniter.com/forums/viewthread/100488/#512159 – Brent Baisley Dec 13 '11 at 02:28
2

Once i had a similar issue while inserting products with special chars in name into cart and in creating my urls

Not sure, but it may be helpful from another point of view. I also had added a my_url_helper in addition for my project to handle urls. mb_string handles char replacements very well. Sorry for my bad language. :(
File: application/config.php

/*
|--------------------------------------------------------------------------
| Allowed URL Characters
|--------------------------------------------------------------------------
|
| This lets you specify with a regular expression which characters are permitted
| within your URLs.  When someone tries to submit a URL with disallowed
| characters they will get a warning message.
|
| As a security measure you are STRONGLY encouraged to restrict URLs to
| as few characters as possible.  By default only these are allowed: a-z 0-9~%.:_-
|
| Leave blank to allow all characters -- but only if you are insane.
|
| DO NOT CHANGE THIS UNLESS YOU FULLY UNDERSTAND THE REPERCUSSIONS!!
|
*/  

//This is not default, its modified for turkish chars
$config['permitted_uri_chars'] = 'a-üöçşığz A-ÜÖÇŞİĞZ 0-9~%.:_\-';
Murat Ünal
  • 396
  • 2
  • 11
1

I'm not particularly familiar with CodeIgniter; however, this:

Codeigniter seems to break $_POST of '£' character (Pound)

...might be relevant. That is, the problem might be in your server stack, not your code or framework! Otherwise, here are some additional links that address other areas of concern w.r.t. CodeIgniter and UTF-8:

http://hash-bang.net/2009/02/utf8-with-codeigniter/

http://philsturgeon.co.uk/blog/2009/08/UTF-8-support-for-CodeIgniter

Hope this helps.

Community
  • 1
  • 1
David O'Riva
  • 696
  • 3
  • 5
  • That does seem tantalisingly similar to this issue. Hopefully the OP will be back soon to say that one of the recent answers was on the money :) – Jon Hanna Dec 16 '11 at 22:46
  • Lackiing an answer-acceptance tick from the OP, I can only judge so well which of the serveral good answers was most helpful. I'm giving the bounty to this one in that referring to POST being mis-interpretted by CodeIgniter, this seems the most promising one to follow up on. – Jon Hanna Dec 18 '11 at 17:37
0

It's not MongoDb as you aren't getting what you need from the post.

I'm almost entirely certain it is your encoding details, not matching from client to server.

Others' suggestions of standardizing on UTF-8 is good practice, but if you didn't want to, just make sure you are using an encoding schema that works with your chars and is used both client-side and server-side.

I'm not an expert at PHP, but you are getting normal characters (B) plus special characters (& and %) and escaped normal characters (%26)... but not escaped special chars like %C3%85.

Update some more info about how you are posting to the server and I'll elaborate more.

one.beat.consumer
  • 9,414
  • 11
  • 55
  • 98