Parsing human-readable filesizes in Java to Bytes

Question

I have a bunch of file sizes that I would like to parse. These are currently only in GBs. Here are some samples:

1.2GB
2.4GB

I think I should store my byte filesizes in a Long value but I can't seem to figure it out. Here's how I'm doing it:

System.out.println(Float.parseFloat("1.2GB".replace("GB", ""))* 1024L * 1024L * 1024L);

This returns a Float value which is displayed as 1.28849024E9. How can I get a Long representation of the filesize in bytes.

I've gotten a little confused with the numeric datatypes. Thanks.

João Silva · Accepted Answer · 2012-08-23T12:01:31.110

9

Use BigDecimal instead:

BigDecimal bytes = new BigDecimal("1.2GB".replace("GB", ""));
bytes = bytes.multiply(BigDecimal.valueOf(1024).pow(3));
long value = bytes.longValue();

Which you can put in a method:

public static long toBytes(String filesize) {
    long returnValue = -1;
    Pattern patt = Pattern.compile("([\\d.]+)([GMK]B)", Pattern.CASE_INSENSITIVE);
    Matcher matcher = patt.matcher(filesize);
    Map<String, Integer> powerMap = new HashMap<String, Integer>();
    powerMap.put("GB", 3);
    powerMap.put("MB", 2);
    powerMap.put("KB", 1);
    if (matcher.find()) {
      String number = matcher.group(1);
      int pow = powerMap.get(matcher.group(2).toUpperCase());
      BigDecimal bytes = new BigDecimal(number);
      bytes = bytes.multiply(BigDecimal.valueOf(1024).pow(pow));
      returnValue = bytes.longValue();
    }
    return returnValue;
}

And call it like:

long bytes = toBytes("1.2GB");

edited Aug 23 '12 at 12:01

answered Aug 23 '12 at 11:34

João Silva

89,303
29
152
158

1

Doesn't solve for "GB" and "MB". – Rob I Aug 23 '12 at 11:46
1

@MridangAgarwalla: Just return `value`. I've edited my answer and included a simple generic method to do that. – João Silva Aug 23 '12 at 12:01
If all you do is turn the result into a `long`, why not use `double` or `long`? – Peter Lawrey Aug 23 '12 at 12:53

score 8 · Answer 2 · edited Feb 03 '15 at 10:40

This function will give you a more general solution. It covers GB, MB and KB and tolerates both comma and dot for the decimal separator. If a plain integer is entered, it passes it through as well.

public static long parseFilesize(String in) {
  in = in.trim();
  in = in.replaceAll(",",".");
  try { return Long.parseLong(in); } catch (NumberFormatException e) {}
  final Matcher m = Pattern.compile("([\\d.,]+)\\s*(\\w)").matcher(in);
  m.find();
  int scale = 1;
  switch (m.group(2).charAt(0)) {
      case 'G' : scale *= 1024;
      case 'M' : scale *= 1024;
      case 'K' : scale *= 1024; break;
      default: throw new IllegalArgumentException();
  }
  return Math.round(Double.parseDouble(m.group(1)) * scale);
}

You're also missing multiplying by 1024 in the 'K' case. – Luis Sep Feb 03 '15 at 10:33 — Luis Sep, Feb 03 '15 at 10:33

score 2 · Answer 3 · answered Aug 23 '12 at 13:08

A shorter version without using BigDecimal.

public static long parseSize(String text) {
    double d = Double.parseDouble(text.replaceAll("[GMK]B$", ""));
    long l = Math.round(d * 1024 * 1024 * 1024L);
    switch (text.charAt(Math.max(0, text.length() - 2))) {
        default:  l /= 1024;
        case 'K': l /= 1024;
        case 'M': l /= 1024;
        case 'G': return l;
    }
}

for (String s : "1.2GB 2.4GB 3.75MB 1.28KB 9".split(" "))
    System.out.println(s + " = " + parseSize(s));

prints

1.2GB = 1288490189
2.4GB = 2576980378
3.75MB = 3932160
1.28KB = 1310
9 = 9
1.2884901888E9

SJuan76 · Answer 4 · 2012-08-23T11:40:35.697

1

If you use Float, you are risking to lose precission.

An alternative could be using BigDecimal.

edited Aug 23 '12 at 11:40

answered Aug 23 '12 at 11:34

SJuan76

24,532
6
47
87

score 1 · Answer 5 · answered Aug 23 '12 at 11:45

How about something like this:

String sizeStr = "1.2GB";

Double base = 1024*Double.parseDouble(sizeStr.replaceAll("[GM]B",""));

final long sizeBytes;

if ( sizeStr.endsWith("GB") ) {
    sizeBytes = 1024*1024*base.longValue());
}
else {
    sizeBytes = 1024*base.longValue());
}

score 1 · Answer 6 · answered Oct 15 '15 at 11:52

If using groovy is an option, here is a version which:

handles non postfixed strings (i.e. just a number "1234")
handles spaces and commas (as in "12,234B" or "12 234kB")
throws a somewhat descriptive exception when it fails to parse the input string
handles prefixes and byte indicators in wrong case. It could be argued that this is not a good idea. In my particular use case I often get imprecise indata which makes it useful to have the algorithm be somewhat lenient on the rules.

code:

import static groovy.test.GroovyAssert.shouldFail

long toBytes(String size) { 
  def matcher = size?.replaceAll(/[, ]/, '') =~ /(?i)^([\d.,]+)([PTGMK]?B)?$/
  if (!matcher) throw new IllegalArgumentException("Can not parse size string '${size}'")

  (matcher[0][1] as BigDecimal) * 1024**[PB:5, TB:4, GB:3, MB:2, KB:1].withDefault {0}[matcher[0][2]?.toUpperCase()]
}

assert toBytes(" 112 ")     == 112
assert toBytes("112")       == 112
assert toBytes("123456789") == 123456789
assert toBytes("1B")        == 1
assert toBytes("1KB")       == 1024
assert toBytes("300MB")     == 300g*1024*1024
assert toBytes("1.2mb")     == 1.2g*1024*1024 as Long
assert toBytes("123 456kB") == 123456g*1024
assert toBytes("123,456B")  == 123456
assert toBytes("1.5GB")     == 1.5g*1024*1024*1024  
assert toBytes("300MB")     == 300g*1024*1024
assert toBytes("512GB")     == 512g*1024*1024*1024
assert toBytes("1Tb")       == 1024g*1024*1024*1024
assert toBytes("1PB")       == 1024g*1024*1024*1024*1024

def checkFailure = { c -> 
  assert shouldFail(c).message.startsWith("Can not parse size string")
}

checkFailure { toBytes(null)   }
checkFailure { toBytes("")     } 
checkFailure { toBytes("x112") }
checkFailure { toBytes("112x") }
checkFailure { toBytes("11x2") }

the method is perhaps not optimized for readability. Had some fun trying to get in all the edge cases while keeping the code concise.

score 0 · Answer 7 · answered Aug 23 '12 at 11:34

0

You can cast the Float value to a Long one before printing it.

answered Aug 23 '12 at 11:34

phsym

1,364
10
20

Parsing human-readable filesizes in Java to Bytes

7 Answers7