9

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.

I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.

The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?

Tom
  • 8,536
  • 31
  • 133
  • 232
  • 10
    You could use a period "." since it uses the fewest pixels other than a blankspace. – TheTXI Jun 26 '09 at 13:36
  • 5
    @TheTXI: Then why not use a space instead? Why waste pixels at all? – Pesto Jun 26 '09 at 13:39
  • Amount of pixels used is different from bandwidth. He's concerned with the binary 1s and 0s being sent over the network. (From what I understand) – samoz Jun 26 '09 at 13:40
  • 3
    samoz: I think we should reduce our overhead in as many arenas as possible. – TheTXI Jun 26 '09 at 13:41
  • 8
    @samoz: Ignore TheTXI. He's one of those environmental nuts who is always going on and on about being having a low pixel footprint and being pixel-neutral. There's no reasoning with them. – Pesto Jun 26 '09 at 13:46
  • 7
    Pesto: You're just another head-in-the-sand luddite who doesn't recognize that we are destroying the internet by polluting it with unnecessary pixels. – TheTXI Jun 26 '09 at 13:47
  • 5
    @TheTXI: There's no proof that pixel pollution leads to Internet Warming. Many scientists don't even think that Internet Warming is real. I'm not going to get my environmental data from the same kooks who want to use all-natural hemp pixels. – Pesto Jun 26 '09 at 13:51
  • 4
    Pesto: See, there you go again. You're more than happy to use the petroleum-based processed pixels which not only take many more valuable resources to produce but also smell like burning plastic, instead of an all-natural wonder plant? Hemp pixel production costs a fraction of the price and is a completely sustainable resource. An no, smoking these pixels would not give you a buzz, only a headache. – TheTXI Jun 26 '09 at 13:55
  • @TheTXI @Pesto Are you guys kidding or serious? I can't pick up on the sarcasm... – samoz Jun 26 '09 at 13:57
  • 1
    @samoz: You make me so very sad. I hope this is your first time on the internet. – GEOCHET Jun 26 '09 at 14:00
  • 4
    @TheTXI: First of all, I happen to enjoy the smell of burning plastic. It reminds me of childhood visits to New Jersey. Second, I don't actually believe we should stick with petroleum pixels, either. I'm a big proponent of nuclear pixels. Did you know that pixels come from electrons, which happen to be a bi-product of nuclear fission? I long for the day when everyone has an under-desk nuclear reactor so that they can use extended ASCII character 219 all they want without fear of you hippies throwing red paint all over them. – Pesto Jun 26 '09 at 14:01
  • 4
    Pesto: You know what also is a by-product of nuclear pixel production? Nuclear pixel waste? You know what we do with that waste? We store it in big drums that leak and will spill all that waste into our bitstreams and our filestreams. Have you ever seen the type of mutated bugs that plague our once-pristine habitats? You really are a soulless programmer. – TheTXI Jun 26 '09 at 14:07
  • 1
    Rich B: Well then you better start thinking more about your development environment more, or all this pixel pollution is going to lead to a possible pony extinction. – TheTXI Jun 26 '09 at 14:09
  • 1
    @TheTXI: :( You monster. Take it back! – GEOCHET Jun 26 '09 at 14:10
  • 3
    @TheTXI: Clearly there is a need for better bit buckets, but you'd throw the baby out with the bathwater. Let's apply a little critical thinking to your hemp "solution". All that hemp will require a tremendous amount of fertilizer. While there is plenty of bullshit on the internet (such as your crazy rantings), it's important to note that this bullshit *requires pixels*. It's an endless cycle: more pixels needs more bullshit which needs even more pixels, etc. And that's not even getting into the amount of farm land necessary. What are we going to do, level WV and turn it into a hemp farm? – Pesto Jun 26 '09 at 14:12
  • 2
    Rich B: The truth hurts, and I will not take it back. Your eyes have to be opened to the damage that all of this excess pixel usage is causing to the software development world. This truly is a crusade, and the only way it can be defeated is via overwhelming numbers of dark-skinned people who do not share in your beliefs. – TheTXI Jun 26 '09 at 14:12
  • 2
    Pesto: WV is already a national leader in marijuana production, so it is obvious that our environment is well suited for industrial hemp production. – TheTXI Jun 26 '09 at 14:14

4 Answers4

22

It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

  • In ASCII or ISO 8859, each character is represented by one byte
  • In UTF-32, each character is represented by 4 bytes
  • In UTF-8, each character uses between 1 and 4 bytes
  • In ISO 2022, it's much more complicated

US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • 2
    US-ASCII characters take 1 byte in pretty much *any* encoding except for UTF-16 and UTF-32. – dan04 Aug 21 '10 at 03:54
  • Character ◌ takes 7 bytes in utf8 encoding. `Buffer.byteLength("◌", "utf8")` – Lukas Liesis Jun 16 '23 at 06:30
  • @LukasLiesis That's not a single character, it's two characters that are combined, see https://en.wikipedia.org/wiki/Combining_character though in this case the combination achieves nothing and the second character (a variation selector) could be discarded. Also, what is this `Buffer` class you're using? – Michael Borgwardt Jun 16 '23 at 11:31
  • it's javascript default class. yes, it's technically couple chars, but user would write it as single char and i had enough pain figuring out while length of string says it's 1 char meanwhile bytes length is different. there are more such, but it destroys UI here. I have some chars which length is 1 by string length but bytes count is 300+ – Lukas Liesis Jun 16 '23 at 14:08
  • @LukasLiesis Ultimately this all comes down to the concept of "character" and "string" becoming *way* more complex than most people are aware of, once you allow for all the special cases in dozens of different scripts as Unicode does. Arguably, "length of a string" is simply not a well-defined concept, and no matter how you define it, some results will not make intuitive sense. Some detailed information can be found here: https://unicode.org/faq/char_combmark.html - and here's also a good article on the topic: https://hsivonen.fi/string-length/ – Michael Borgwardt Jun 17 '23 at 13:36
  • @MichaelBorgwardt i totally agree, from tech standpoint - sure, issue is users usually don't care and if input is 100 chars long, it's equally expected to be able to save 100 emojis, 100 ◌ or 100 ASCII symbols. I usually working on user-facing software, so coming more from the user position rather than from computer science position. – Lukas Liesis Jun 18 '23 at 05:20
6

The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.

If you choose a char from the ASCII set, then yes your delimter is a small as possible.

Scott Weinstein
  • 18,890
  • 14
  • 78
  • 115
5

It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.

ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
-6

No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).

A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.

You can find an ASCII chart and what numbers correspond to what characters here.

samoz
  • 56,849
  • 55
  • 141
  • 195
  • 1
    Such as the equation of characters and bytes, "1 byte characters are called ASCII characters", "you can't use this 8th bit". I suggest you read http://www.joelonsoftware.com/articles/Unicode.html very carefully. – Michael Borgwardt Jun 26 '09 at 13:46
  • I just read the article you sent me and I still don't see how I'm glaringly wrong. He can still send ASCII characters (even if they are UTF-8) in 1 byte. And after thinking about it, the "can't use 8th bit" comment was wrong, it would just need some extra processing to strip out the 8th bit signal that he was sending. – samoz Jun 26 '09 at 14:00
  • 1
    The most important thing that's wrong is that characters aren't bytes, and it also makes no sense to say that characters "are UTF-8" or "are Unicode or wide". Nor do characters have a length. You need an ENCODING to translate characters to bytes, and only then can you talk about length and which characters the encoding supports. And there certainly are encodings in which the characters supported by ASCII take more than 1 byte. – Michael Borgwardt Jun 26 '09 at 14:12
  • I'm talking about when you type: char c, you get 1 byte allocated to you. The OP asked if he can use something smaller, to which the answer is no, because a byte is the smallest thing you can allocate. By character, I'm talking about the char type, not an actual letter. By larger characters, I'm talking about the wchar type. – samoz Jun 26 '09 at 14:25
  • 1
    The OP didn't say what language he uses; C-specific answers that aren't even recognizable as such are not what he needs. BTW, your answer is wrong for C as well; the C standard indeed mandates that 1 char == 1 byte (and oh how much suffering that idiocy has caused), but it does NOT mandate 8-bit bytes and there are in fact architectures where bytes have more or fewer bits. – Michael Borgwardt Jun 26 '09 at 15:09