0

I need to send a sequence of spaces or zeros, whose size is of fixed byte length. The in house framework I am working with takes care of converting string into bytes as per encoding. So in nutshell I need to create a string of given length.

I found out that one space is of two bytes in Java, but it might differ based on encoding used. I tried ensureCapacity method of byte [] with min length and maximum capacity, but while converting it to String, there are no padding of zeros, even though byte array length is same as maximum capacity I am giving.

I am getting a little confused while trying to grasp the concept of bytes, string, charsets, byte [], new String(byte []), encoding, String.toBytes etc.

How do i go about creating the same. Also if anyone knows a good reference on these topics, please share. I am using Java

sss
  • 598
  • 6
  • 24
  • Spaces and zeros both encode as single bytes in UTF-8 encoding. So if you need a result of exactly, say, 10 bytes, just take a `String` of 10 characters with any combination of spaces and zeros, convert it to bytes using UTF-8 encoding and the result should be a `byte[]` of length 10. – Kevin Anderson Oct 04 '19 at 11:31
  • I have to do reverse of it. I know the size of bytes and have to create a string out of it. The byte length is a parameter as i have to create many strings for different byte size. Also, what should be my criteria for selecting the encoding ? – sss Oct 04 '19 at 11:36
  • Hello, For converting Bytes Array to String and reverse, you should have a look at this : https://www.baeldung.com/java-string-to-byte-array. However, to solve your underlying problem, maybe google byte string should worth a look. – EfficiencyOverflow Oct 04 '19 at 11:36
  • For bytes-to-String conversion, the encoding you need to specify has already decided for you by the creator of the byte stream. But if the bytes really are nothing more than spaces (0x20) and zeros (0x30), you can safely treat it as UTF-8. – Kevin Anderson Oct 04 '19 at 12:12

1 Answers1

0

Generally any standard english character (assuming you use UTF-8, which you currently don't) consists of 1 byte of data. This is because some higher up people decided what characters take up what bytes and how they should be displayed. To save space, they gave the most common characters single byte values. Any special characters, like symbols from asian languages, use more bytes depending on their placement within the character definitions.

To check how many bytes a string has you can use this website: https://mothereff.in/byte-counter It will show you the amount of bytes used for the text in UTF-8. If you'd want to check it from code you could convert the string into a byte array and that should hold the same result!

martijn p
  • 598
  • 4
  • 19