4

I'm trying to tag my text with a delimiter at specific places that will be used later for parsing. I want to use a delimiter character that is least frequently used. I'm currently looking at the "\2" or the U+0002 character. Is that safe enough to use? What other suggestions are there? The text is unicode and will have both english and non-english characters.

A want to use a character that can still be "exploded()" by PHP.

Edit:

Also I want to be able to display this piece of text on screen (to the browser) and the delimiter will be "invisible" to the user. I can definitely use a str_replace() to get rid of visible delimiters, but if there are good invisible delimiters, then no such processing is needed.

samxli
  • 1,536
  • 5
  • 17
  • 28
  • use more than one character :) – k102 Jun 27 '11 at 13:56
  • maybe you'll make a delimiter string (something like [!--!]) that is not very frequently used? =) – Greenisha Jun 27 '11 at 13:57
  • You could encode text like that: `[lenght]-text[length]-text2[length]-text3...` (for instance `3-foo6-foobar`, expanding to `['foo', 'foobar']`), which would be a surefire way to avoid the conflicts that `explode` can bring while remaining manageably easy to parse. – zneak Jun 27 '11 at 13:59
  • http://stackoverflow.com/q/5847982/469210 is a similar question. – borrible Jun 27 '11 at 14:02

1 Answers1

6

If this is only for an internal representation (i.e. not for interchange and storage), then you can use a non-character code point such as U+FFFF. Java uses that as the signal that a CharacterIterator is done, for example.

Community
  • 1
  • 1
Joey
  • 344,408
  • 85
  • 689
  • 683
  • This will be stored in MySQL as a TEXT field. If I get that field content in my PHP app, will that code point still show up? – samxli Jun 27 '11 at 13:59
  • You should ask that the MySQL docs. If in doubt, use a BLOB instead. Or normalize your database a little more instead of doing parsing in PHP. – Joey Jun 27 '11 at 14:02
  • 1
    @samxli, if you store it in a MySQL table, you could also make a separate table to store the data. What you're doing smells like improvements over your database design is possible. – zneak Jun 27 '11 at 14:03
  • The text is an article segment. And I need to tag that segment text with the delimiter for another layer of processing later on. But that text will still be shown to the user. – samxli Jun 27 '11 at 14:07