5

Disclaimer: I KNOW that in 99% of cases you shouldn't "serialize" data in a concatenated string.

What char you guys use in well-known situation:

string str = userId +"-"+ userName;

In majority of cases I have fallen back to | (pipe) but, in some cases users type even that. What about "non-typable" characters like ☼ (ALT+9999)?

nikib3ro
  • 20,366
  • 24
  • 120
  • 181
  • 1
    Can you explain why you are doing this? It smells like a really bad idea. – JohnFx Apr 22 '12 at 06:55
  • Let's say I want to encrypt some data and send it over the network to mobile app using HTTP. I don't have time to setup fancy serialization framework and want to get up and running as soon as possible... i.e. I do Context.Write(Encrypt("Param1|Param2|Param3")) – nikib3ro Apr 22 '12 at 07:00
  • Here are some instructions for using the wheel: http://www.c-sharpcorner.com/UploadFile/bipinjoshi/serializingObjectsinCS11102005234746PM/serializingObjectsinCS.aspx Sorry, no time to chat I'm off to bed. – JohnFx Apr 22 '12 at 07:08
  • Who says the separator has to be a *single character*? Also, if you can define the range of the userId, then you can (for example) assume all characters up until the first "-" is the userId, and all after the first "-" is the user name. No worries. –  Apr 23 '12 at 01:43
  • @Will I know - there are numerous ways to approach this "problem", especially since it's not best practice. The main reason why I asked this question is to see if somebody has "solved" it in extremely robust way (e.g.: use '\u12345' because it is least used Unicode character). As for separators that are multiple characters - sure, that would work (love http://stackoverflow.com/questions/1254577/string-split-by-multiple-character-delimiter). I am just looking for the EASIEST solution that would effortlessly work across different platforms and programming languages - all have String.Split(char) – nikib3ro Apr 23 '12 at 21:08
  • Seems there are more questions similar to this, all without straight answer... number of philosophers on StackOverflow is obviously increasing. [link1](http://stackoverflow.com/questions/1254577/string-split-by-multiple-character-delimiter) [link2](http://stackoverflow.com/questions/3482683/can-a-valid-unicode-string-contain-ffff-is-java-characteriterator-broken) [link3](http://stackoverflow.com/questions/6493956/least-used-unicode-delimiter) [link4](http://stackoverflow.com/questions/5847982/utf-8-string-delimiter) – nikib3ro Apr 23 '12 at 21:14

2 Answers2

8

That depends on too many factors to give a concrete answer.

Firstly, why are you doing this? If you feel the need to store the userId and userName by combining them in this fashion, consider alternative approaches, e.g. CSV-style quoting or similar.

Secondly, under normal circumstances only delimiters that aren't part of the strings should be used. If userId is just a number then "-" is fine... but what if the number could be negative?

Third, it depends on what you plan to do with the string. If it is simply for logging or debugger or some other form of human consumption then you can relax a bit about it, and just choose a delimiter that looks appropriate. If you plan to store data like this, use a delimiter than ensures you can extract the data properly later on, regardless of the values of userId or userName. If you can get away with it, use \0 for example. If either value comes from an untrusted source (i.e. the Internet), then make sure the delimiter can't be used as a character in either string. Generally you would limit the characters that each contains - say, digits for userId and letters, digits and SOME punctuation characters for userName.

Michael Slade
  • 13,802
  • 2
  • 39
  • 44
  • Hmph... yeah \0 is fine. I am not looking for completely foolproof way of doing this (which is obvious from mentioning - and |), just am curious what people recommend would recommend and what tricks related to this they have up their sleeves. – nikib3ro Apr 22 '12 at 07:10
6

If it's for data storage and retrieval, there is no way to guarantee that a user won't find a way to inject your delimiter into the string. The safe thing to do is pre-process the input somehow:

  • Let - be the special character
  • If a - is encountered in the input, replace it with something like -0.
  • Use -- as your delimiter

So userid = "alpha-dog" and userName = "papa--0bear" will be translated to

alpha-0dog--papa-0-00bear

The important thing is that your scheme needs to be perfectly undoable, and that the user shouldn't be able to break it, no matter what they enter.

Essentially this is a very primitive version of sanitization.

trutheality
  • 23,114
  • 6
  • 54
  • 68