-1

i have a very long string, and want to wirt to a gzip file

i try use GZIPOutputStream to write a gzip file

but where has exception when i use string.getBytes()

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding.encode(StringCoding.java:350)
        at java.lang.String.getBytes(String.java:941)

there is my code, what should i do that can write file successfully?

public static void way1() throws IOException {
    String filePath = "foo";
    String content = "very large string";
    try (OutputStream os = Files.newOutputStream(Paths.get(filePath));
         GZIPOutputStream gos = new GZIPOutputStream(os)) {
        gos.write(content.getBytes(StandardCharsets.UTF_8));
    }
}

public static void way2() throws IOException {
    String filePath = "foo";
    String content = "very large string";
    try (OutputStream os = Files.newOutputStream(Paths.get(filePath));
         GZIPOutputStream gos = new GZIPOutputStream(os);
         WritableByteChannel fc = Channels.newChannel(gos)) {
        fc.write(ByteBuffer.wrap(content.getBytes(StandardCharsets.UTF_8)));
    }
}
Shigure
  • 137
  • 10
  • 2
    If you handle your huge text as a single `String` then you already manouvered yourself into a situation where there's no good solutions. Where did that `String` come from? Can you alternatively have a `Reader` that provides it or some other way of streaming the data instead of producing a single `String`? – Joachim Sauer Sep 14 '22 at 15:26
  • Where does this string come from? If it's huge, you should not handle it all at once(loading everything in memory), but in chunks. Read some logical part, do something with it, if needed, write it, rinse and repeat. – Chaosfire Sep 14 '22 at 15:29
  • this string is from database, and I processed the data and generated as a string – Shigure Sep 14 '22 at 15:30
  • but i need a full gzip file, if i chunking this string, it will create more than one file – Shigure Sep 14 '22 at 15:31
  • @Shigure Keep the output stream open and keep writing to it, until you finish processing. – Chaosfire Sep 14 '22 at 15:35

3 Answers3

1

If you have ResultSet then try something like:

public static void string2Zipfile(ResultSet rs, int columnIndex, Path outputFile) throws SQLException, IOException {
    try (InputStream os = rs.getBinaryStream(columnIndex)) {
        try (GZIPOutputStream gos = new GZIPOutputStream(Files.newOutputStream(outputFile))) {
            os.transferTo(gos);
        }
    }
}
g00se
  • 3,207
  • 2
  • 5
  • 9
  • Is there any guarantee as to what charset will be used to translate the database value into bytes? I would use `getCharacterStream` instead of getBinaryStream, then write to a OutputStreamWriter that wraps the GZIPOutputStream. – VGR Sep 14 '22 at 20:43
  • *Is there any guarantee as to what charset will be used to translate the database value into bytes?* It's difficult to know. We'd need to know how the column was defined and what RDBMS was being used. No 'translation' as such will be done - the bytes in the column value will come out the same as they are in the db. We also don't know what encoding would be the most convenient, were a character encoding to be applied. – g00se Sep 14 '22 at 22:21
  • fyi this is a one liner that would work with the correct parameters if you're using MySql on a Unix-based system: ```mysql -p -ss -e "SELECT description FROM job WHERE id = 102" music_work | gzip -c >q.gz``` – g00se Sep 14 '22 at 22:38
  • “…the bytes in the column value will come out the same as they are in the db.” Only if they’re in a (var)binary to begin with. If it’s a (var)char or text column, there are multiple factors that will affect how the characters become bytes: both at the database level and in the JDBC driver. It’s a confusing mess that’s best avoided. – VGR Sep 14 '22 at 22:50
  • Yes. We need to know the details – g00se Sep 14 '22 at 22:58
  • i need process the column value, so this way not for me. i try chunking the string, it's can write file successfully – Shigure Sep 15 '22 at 07:55
0

It seems that when you convert String to byte[] (using content.getBytes(StandardCharsets.UTF_8)) it just needs a lot of memory for the byte[]. Instead of the conversion of the full String to byte[] at once create a ByteBuffer from it using the selcted encoding, and then write this ByteBuffer to the GZIPOutputStream, this way you will lower the needed size of memory at least by half. To create the ByteBuffer you can use:

Charset charset = StandardCharsets.UTF_8; 
String content = "very large string";
ByteBuffer  byteBuffer = charset.encode(content );

API of ByteBuffer: https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html And this might be usefull: How to put the content of a ByteBuffer into an OutputStream?

Alternativelly you can also increase the amount of memory for the java heap: Increase heap size in Java

All together would be very similar to your way2, smthg like this (I didn't test it)

public static void way2() throws IOException {
    String filePath = "foo";
    String content = "very large string";
    try (OutputStream os = Files.newOutputStream(Paths.get(filePath));
         GZIPOutputStream gos = new GZIPOutputStream(os);
         WritableByteChannel fc = Channels.newChannel(gos)) {
        Charset charset = StandardCharsets.UTF_8; 
       
        ByteBuffer  byteBuffer = charset.encode(content );
        fc.write(byteBuffer );
    }
}
Krzysztof Cichocki
  • 6,294
  • 1
  • 16
  • 32
  • I'm confused: *a.* you don't have a string, you have a value in a database and *b.* that code might use nio but it doesn't address the memory problem – g00se Sep 15 '22 at 05:50
  • The code deals with String and it does address the memory problem, why you think it doesn't? – Krzysztof Cichocki Sep 16 '22 at 07:21
  • Because the origin of the string is in a database and it needs never to be held in memory at all. – g00se Sep 16 '22 at 08:50
  • But He only said something that indicate that the string is somehow produced from the data in database, not that it is readed directly. – Krzysztof Cichocki Sep 17 '22 at 07:48
  • *this string is from database, and **I processed the data and generated as a string*** (My emphasis) – g00se Sep 17 '22 at 09:47
0

i use @Chaosfire suggest, edit code like this, it's write file successfully

public static void way1(List<String> originContent) throws IOException {
    String filePath = "foo";
    try (OutputStream os = Files.newOutputStream(Paths.get(filePath));
         GZIPOutputStream gos = new GZIPOutputStream(os)) {
        Lists.partition(originContent, 1000000).stream().map(part -> String.join("\r\n", part)).forEach(str -> {
            try {
                gos.write(str.getBytes(StandardCharsets.UTF_8));
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
    }
}
Shigure
  • 137
  • 10