10

I have an array String[] in Java, and must first encode/convert it into a String and then further in the code covert it back to the String[] array. The thing is that I can have any character in a string in String[] array so I must be very careful when encoding. And all the information necessary to decode it must be in the final string. I can not return a string and some other information in an extra variable.

My algorithm I have devised so far is to:

  1. Append all the strings next to each other, for example like this: String[] a = {"lala", "exe", "a"} into String b = "lalaexea"

  2. Append at the end of the string the lengths of all the strings from String[], separated from the main text by $ sign and then each length separated by a comma, so:

b = "lalaexea$4,3,1"

Then when converting it back, I would first read the lengths from behind and then based on them, the real strings.

But maybe there is an easier way?

Cheers!

Janek
  • 1,441
  • 2
  • 19
  • 28

4 Answers4

13

If you don't wanna spend so much time with string operations you could use java serialization + commons codecs like this:

public void stringArrayTest() throws IOException, ClassNotFoundException, DecoderException {
    String[] strs = new String[] {"test 1", "test 2", "test 3"};
    System.out.println(Arrays.toString(strs));

    // serialize
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    new ObjectOutputStream(out).writeObject(strs);

    // your string
    String yourString = new String(Hex.encodeHex(out.toByteArray()));
    System.out.println(yourString);

    // deserialize
    ByteArrayInputStream in = new ByteArrayInputStream(Hex.decodeHex(yourString.toCharArray()));
    System.out.println(Arrays.toString((String[]) new ObjectInputStream(in).readObject()));
}

This will return the following output:

[test 1, test 2, test 3]
aced0005757200135b4c6a6176612e6c616e672e537472696e673badd256e7e91d7b47020000787000000003740006746573742031740006746573742032740006746573742033
[test 1, test 2, test 3]

If you are using maven, you can use the following dependency for commons codec:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.2</version>
</dependency>

As suggested with base64 (two lines change):

String yourString = new String(Base64.encodeBase64(out.toByteArray()));
ByteArrayInputStream in = new ByteArrayInputStream(Base64.decodeBase64(yourString.getBytes()));

In case of Base64 the result string is shorter, for the code exposed below:

[test 1, test 2, test 3]
rO0ABXVyABNbTGphdmEubGFuZy5TdHJpbmc7rdJW5+kde0cCAAB4cAAAAAN0AAZ0ZXN0IDF0AAZ0ZXN0IDJ0AAZ0ZXN0IDM=
[test 1, test 2, test 3]

Regarding the times for each approach, I perform 10^5 executions of each method and the result was as follows:

  • String manipulation: 156 ms
  • Hex: 376 ms
  • Base64: 379 ms

Code used for test:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.util.StringTokenizer;

import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.Hex;


public class StringArrayRepresentationTest {

    public static void main(String[] args) throws IOException, ClassNotFoundException, DecoderException {

        String[] strs = new String[] {"test 1", "test 2", "test 3"};


        long t = System.currentTimeMillis();
        for (int i =0; i < 100000;i++) {
            stringManipulation(strs);
        }
        System.out.println("String manipulation: " + (System.currentTimeMillis() - t));


        t = System.currentTimeMillis();
        for (int i =0; i < 100000;i++) {
            testHex(strs);
        }
        System.out.println("Hex: " + (System.currentTimeMillis() - t));


        t = System.currentTimeMillis();
        for (int i =0; i < 100000;i++) {
            testBase64(strs);
        }
        System.out.println("Base64: " + (System.currentTimeMillis() - t));
    }

    public static void stringManipulation(String[] strs) {
        String result = serialize(strs);
        unserialize(result);
    }

    private static String[] unserialize(String result) {
        int sizesSplitPoint = result.toString().lastIndexOf('$');
        String sizes = result.substring(sizesSplitPoint+1);
        StringTokenizer st = new StringTokenizer(sizes, ";");
        String[] resultArray = new String[st.countTokens()];

        int i = 0;
        int lastPosition = 0;
        while (st.hasMoreTokens()) {
            String stringLengthStr = st.nextToken();
            int stringLength = Integer.parseInt(stringLengthStr);
            resultArray[i++] = result.substring(lastPosition, lastPosition + stringLength);
            lastPosition += stringLength;
        }
        return resultArray;
    }

    private static String serialize(String[] strs) {
        StringBuilder sizes = new StringBuilder("$");
        StringBuilder result = new StringBuilder();

        for (String str : strs) {
            if (sizes.length() != 1) {
                sizes.append(';');
            }
            sizes.append(str.length());
            result.append(str);
        }

        result.append(sizes.toString());
        return result.toString();
    }

    public static void testBase64(String[] strs) throws IOException, ClassNotFoundException, DecoderException {
        // serialize
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        new ObjectOutputStream(out).writeObject(strs);

        // your string
        String yourString = new String(Base64.encodeBase64(out.toByteArray()));

        // deserialize
        ByteArrayInputStream in = new ByteArrayInputStream(Base64.decodeBase64(yourString.getBytes()));
    }

    public static void testHex(String[] strs) throws IOException, ClassNotFoundException, DecoderException {
        // serialize
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        new ObjectOutputStream(out).writeObject(strs);

        // your string
        String yourString = new String(Hex.encodeHex(out.toByteArray()));

        // deserialize
        ByteArrayInputStream in = new ByteArrayInputStream(Hex.decodeHex(yourString.toCharArray()));
    }

}
Francisco Spaeth
  • 23,493
  • 7
  • 67
  • 106
  • 1
    This is a safer method than those proposed. The overhead is larger though, using another encoding than hex such as base64 would be a good idea. – ARRG Nov 07 '12 at 14:55
  • @ARRG: thanks for your comment, I just commented the changes needed to use base64 – Francisco Spaeth Nov 07 '12 at 15:03
  • And how is the performance of this two solutions (string manipulation vs proposed in this answer)? – Janek Nov 07 '12 at 15:21
  • Deflating using Base64 (best option IMHO) takes about 17ms on my machine for an Integer array of 5. Inflating takes just 1ms. – mvreijn Feb 08 '17 at 13:57
1

Use a Json parser like Jackson to serialize/deserialize other type of objects as well like integer/floats ext to strings and back.

Prakash Nadar
  • 2,694
  • 1
  • 19
  • 20
0

Just use a known separator (such as @ or # to append your strings), then use yourString.split(yourSeparator) to get an array from it.

dounyy
  • 666
  • 12
  • 24
  • not safe et all, since this char sequence can be present on the string itself – Francisco Spaeth Nov 08 '12 at 08:52
  • Well, I tend to agree with you. But you still can use chars that are forbidden somewhere else in your application, such as any char forbidden in databases for example. Of course @ and # were examples... – dounyy Nov 08 '12 at 09:18
0

I would use the symbol between the words to later use the String#split method to get the String back. Based in your $ symbol example, it would be

public String mergeStrings(String[] ss) {
    StringBuilder sb = new StringBuilder();
    for(String s : ss) {
        sb.append(s);
        sb.append('$');
    }
    return sb.toString();
}

public String[] unmergeStrings(String s) {
    return s.split("\\$");
}

Note that in this example, I add a double \ before the $ symbol because the String#split method receives a regular expression as parameter, and the $ symbol is a special character in regex.

public String processData(String[] ss) {
    String mergedString = mergeStrings(ss);
    //process data...
    //a little example...
    for(int i = 0; i < mergedString.length(); i++) {
        if (mergedString.charAt(i) == '$') {
            System.out.println();
        } else {
            System.out.print(mergedString.charAt(i));
        }
    }
    System.out.println();
    //unmerging the data again
    String[] oldData = unmergeStrings(mergedString);
}

In order to support any character in your String[], it would be better to set not a single character as separator but instead another String. The methods would turn into this:

public static final String STRING_SEPARATOR = "@|$|@";
public static final String STRING_SEPARATOR_REGEX = "@\\|\\$\\|@";

public String mergeStrings(String[] ss) {
    StringBuilder sb = new StringBuilder();
    for(String s : ss) {
        sb.append(s);
        sb.append(STRING_SEPARATOR);
    }
    return sb.toString();
}

public String[] unmergeStrings(String s) {
    return s.split(STRING_SEPARATOR_REGEX);
}
Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
  • the OP explained that he *can have any character in a string in String[] array*, so you should escape the chosen separator before *joining*, e.g. `s.replaceAll("\\$", "\\\\\\$");`. – sp00m Nov 07 '12 at 14:21
  • @sp00m I would prefer to main the data unchanged, instead propose a new pattern to separate each `String` (and it's regex to split it back). – Luiggi Mendoza Nov 07 '12 at 14:29
  • but it does not solve the problem, still it can happen that this pattern will be in one of the string in String[]. An idea would be to draw always the pattern but then still there is a possibility and it does not seem to be very clean solution. – Janek Nov 07 '12 at 14:54