0

Are decoded strings without additional attributes and Unicode interchangeable?

Update:

Does it make a difference if I write

subroutine expects decoded strings.

or write

subroutine expects Unicode strings.

?

sub subroutine {
    my $unicode = shift;
    utf8::upgrade( $unicode );
    my $gcs = Unicode::GCString->new( $unicode );
    my $colwidth = $gcs->columns();
    return $colwidth;
}
sid_com
  • 24,137
  • 26
  • 96
  • 187
  • Get this one, you will find it http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x – Pir Fahim Shah Oct 12 '12 at 18:14
  • @Pir Fahim Shah, That linked node is not relevant. sid_com didn't ask what happens when one decodes something that's already been decoded. And even if he did, the answer for Perl5 is different than the answer for Python. – ikegami Oct 12 '12 at 18:17
  • 3
    what do you *mean*? what do you want to *do*? contextless use of terms like this isn't going to get you any kind of meaningful answer IMO – ysth Oct 12 '12 at 18:37
  • I want to check if my understanding makes sense. I would not say contextless - the tags for example are some kind of context. – sid_com Oct 12 '12 at 18:45
  • 1
    sid_com: what is a "decoded string"? what is "Unicode"? (the latter has a precise definition, but it's pretty clear that's not what you mean; people use the term to mean all kinds of things.) How about a sample script with a question about what it does? If you can't reduce your question to that, you probably have more than one question. – ysth Oct 12 '12 at 19:38
  • @ysth, I assumed they meant "string produced by `decode`" and "string of Unicode code points", and gave my answer in those terms. If he meant something else, he can elaborate. – ikegami Oct 12 '12 at 19:40
  • the edited question is much more answerable, thanks; @ikegami, you want to take a swing at it? – ysth Oct 14 '12 at 04:58
  • @ysth, All I got is: I suspect more people will understand if you say "decoded" instead of "Unicode", but I'm not sure. – ikegami Oct 14 '12 at 05:23

1 Answers1

1

Assuming we are talking about decoding a character encoding (UTF-8, cp1252, etc), yes.

Encode's decode produces a string of Unicode code points. "Unicode string" is a fitting description of the result.

Note that "Unicode string" is not an fitting alternative to "strings stored using the UTF8=1 format". Unlike the strings returned by decode, a string stored using the UTF8=1 format is not necessarily a string of Unicode code points.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • The perl documentation I find states that decode converts encoded strings into Perl's internal format. I can't find anything that specifies what that is. Are you sure it's not UTF-16, which would mean not all codepoints are represented directly by their own value? – bames53 Oct 12 '12 at 18:17
  • neither one of those sounds like strings of unicode codepoints. They both sound like strings of 8-bit code units (UTF-8 uses 8-bit code units). – bames53 Oct 12 '12 at 18:26
  • Since his question is about the result of decode, and the result of decode is a string in Perl's internal format, I took his question to be "Is Perl's internal format interchangeable with `Unicode`?" Maybe that's wrong though. – bames53 Oct 12 '12 at 18:38
  • @bames53, By the way, I added to my answer to avoid any confusion. – ikegami Oct 12 '12 at 19:41