Convert float[] to string and back to float[] - test fails, but I can't see why

Question

The general method has been answered multiple times before, but I have a problem with my implementation which fails, and am looking to see if a kind reader can spot where I'm going wrong.

Code and test are;

 [TestMethod]
    public void FloatConversion()
    {
        // Set up some test data
        int repetitions = 100000;
        Random rand = new Random();
        float[] testSetOfFloats = new float[repetitions];
        for (int count = 0; count < repetitions; count++)
        {
            testSetOfFloats[count] = rand.NextFloat(0, float.MaxValue);
        }

        // Convert the floats into a byte array
        byte[] floatsAsByteArray = new byte[repetitions * 4]; // 4 bytes for a Single
        for (int count = 0; count < repetitions; count++)
        {
            byte[] floatAsBytes = BitConverter.GetBytes(testSetOfFloats[count]);
            floatAsBytes.CopyTo(floatsAsByteArray, count * 4);
        }
        // Convert the byte array to a Unicode string
        string serialisedByteArray = System.Text.Encoding.Unicode.GetString(floatsAsByteArray);

        // ... Do some work, store the string, re-read the string, then ...

        // Convert the unicode string back into a byte array
        byte[] deserializedByteArray = System.Text.Encoding.Unicode.GetBytes(serialisedByteArray);

        // Convert the byte array back into an array of floats
        float[] deserializedFloats = new float[repetitions];
        for (int count = 0; count < repetitions; count++)
        {
            int offset = count * 4;
            deserializedFloats[count] = BitConverter.ToSingle(deserializedByteArray, offset);
        }

        for (int count = 0; count < repetitions; count++)
        {
            // This will fail - but many will pass the test.
            Assert.IsTrue(deserializedFloats[count] == testSetOfFloats[count]);
        }
    }

The only non-standard method is an extension to Random NextFloat() which just returns a random Single from the passed range of values.

@LPs its the test at the end that fails - the Assert.IsTrue will fail within the first 200 float comparisons, as described in the code comments. Is that really worthy of a downvote ? — PhillipH, Jan 30 '15 at 07:58
I think that the error occurs due to the loss of data in conversion from float to byte. Try to compare each element of the array and take some kind of deviation test. — Bhaskar, Jan 30 '15 at 08:02
First you need to figure out which translation is the problematic part. Check that the deserialized byte array is identical to the serialized byte array. — zmbq, Jan 30 '15 at 08:08
@KCdod - clearly; but where and why ? The BitConverter will not lose precision its going directly from a byte[4] to a byte[4]. The problem is in converting the byte[4] to a string using the Unicode.GetString/GetBytes. Unicode is a lossless encoder. — PhillipH, Jan 30 '15 at 08:09
@L16H7 - there is no loss of precision converting from Single to byte[] using BitConverter. — PhillipH, Jan 30 '15 at 08:09
@zmbq - it is of equal length, but not every value is equal; as demonstrated in the test at the end of the code — PhillipH, Jan 30 '15 at 08:12
The test is on the floats, I want you to test the two byte arrays. — zmbq, Jan 30 '15 at 08:19
@zmbq - the two byte arrays are different - its inherent in the test; otherwise the test would pass. I've accepted DasKrümelmonster answer as correct ; Unicode cannot reliably convert all bytes, but Base64 can. — PhillipH, Jan 30 '15 at 08:20

score 5 · Answer 1 · answered Jan 30 '15 at 08:05

5

// Convert the byte array to a Unicode string

    string serialisedByteArray = System.Text.Encoding.Unicode.GetString(floats);

You are converting floats to byte an then convert that to string ... a recipe for troubles.

There are certain byte Sequences (look up Surrogate pair, a high surrogate is invalid if not followed by a low surrogate and vice versa), that are not a valid UCS-2 string and therefore may not "survive" the round-trip from byte[] to string and back.

The question is therefore: Why do you convert binary data 1:1 into a string? If you need to transmit the binary as string, there are many encodings to choose from, e.g. base64.

answered Jan 30 '15 at 08:05

DasKrümelmonster

5,816
1
24
45

Because .net strings are encoded as Unicode internally. I will experiment with the Base64 encoding to see if I get a better result. – PhillipH Jan 30 '15 at 08:10
Ok, I was following the advice in http://stackoverflow.com/questions/1003275/how-to-convert-byte-to-string which is marked as Answer but doesn't work in all circumstances. Unicode (or in fact any other Encoder) cannot encode all byte sequences, only Base64 can do the job. I changed my serializer to use Convert.ToBase64 and Convert.FromBase64String instead of using Unicode.GetBytes and UniCode.GetString and the error went away. – PhillipH Jan 30 '15 at 08:19

score 1 · Answer 2 · answered Jan 30 '15 at 08:20

1

@DasKrümelmonster's answer is correct. I want to emphasize a point though - your test was incomplete.

Had you added a test to make sure the first and second byte arrays are the same, all this would have been perfectly clear.

answered Jan 30 '15 at 08:20

zmbq

38,013
14
101
171

This is comment on a comment not an anwser. Just vote up @DasKrümelmonster. – PhillipH Jan 30 '15 at 08:50
I think this is important enough to warrant an answer. – zmbq Jan 30 '15 at 09:12

Convert float[] to string and back to float[] - test fails, but I can't see why

2 Answers2