17

In a Android application I want to use Scanner class to read a list of floats from a text file (it's a list of vertex coordinates for OpenGL). Exact code is:

Scanner in = new Scanner(new BufferedInputStream(getAssets().open("vertexes.off")));
final float[] vertexes = new float[nrVertexes];
for(int i=0;i<nrVertexFloats;i++){
    vertexes[i] = in.nextFloat();
}

It seems however that this is incredibly slow (it took 30 minutes to read 10,000 floats!) - as tested on the 2.1 emulator. What's going on? I don't remember Scanner to be that slow when I used it on the PC (truth be told I never read more than 100 values before). Or is it something else, like reading from an asset input stream?

Thanks for the help!

Cristian Vrabie
  • 3,972
  • 5
  • 30
  • 49
  • 3
    I'd suggest to profile it: http://developer.android.com/intl/zh-TW/guide/developing/tools/traceview.html – yanchenko Mar 15 '10 at 11:59
  • 3
    Thanks for the suggestion. I profiled it (for 100 floats) and it seems that the calls to nextFloat take all the time. Because of the BufferedInputStream only 2 calls are made to read from the input and they take very little time (35ms/call). However calls to nextFloat take 435ms/call which is huge. By looking at children calls it seems that calls inside NumberFormat and Pattern are the killers (a lot of memory allocations). I'll try some other parsing method and report back. – Cristian Vrabie Mar 15 '10 at 15:48
  • 1
    It seems that Scanner is indeed VERY slow on the device/emulator! It might be because of the huge number of memory allocations. On the emulator it takes 30 minutes to read 10,000 floats. On the PC it takes 1 second to read 20,000 floats (with Scanner). As a solution I found the following to work very well: first I parse my input file on the PC and transform it into binary data, then I read it on the device byte by byte (buffered) and reconstruct the numbers. This is MUCH faster. It takes 1.5s to read 20,000 floats. I say it's a enormous improvement from 1 hour :) Thanks for all the help! – Cristian Vrabie Mar 17 '10 at 12:47
  • 1
    Same problem. On HTC desire Android 2.2 it took 12 seconds to read some 800 floats with Scanner. – iseeall Oct 20 '11 at 12:10

7 Answers7

23

As other posters have stated it's more efficient to include the data in a binary format. However, for a quick fix I've found that replacing:

scanner.nextFloat();

with

Float.parseFloat(scanner.next());

is almost 7 times faster.

The source of the performance issues with nextFloat are that it uses a regular expression to search for the next float, which is unnecessary if you know the structure of the data you're reading beforehand.

It turns out most (if not all) of the next* use regular expressions for a similar reason, so if you know the structure of your data it's preferable to always use next() and parse the result. I.E. also use Double.parseDouble(scanner.next()) and Integer.parseInt(scanner.next()).

Relevant source: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/java/util/Scanner.java

Ian Newson
  • 7,679
  • 2
  • 47
  • 80
  • I have been looking to speed up my scanning method and this did the trick. Wow... +1 – semajhan Jul 29 '11 at 20:49
  • 1
    Incredible, just this simple replacement increased the speed of reading floats by a factor of 100 on my test device (from 10 secs to just 0.1 sec) – iseeall Oct 20 '11 at 12:15
  • Same is true if you read in Doubles, use Double.parseDouble(rowScanner.next()) instead! – Roalt Jan 26 '12 at 21:31
  • So, is this true for nextInt() and nextLine() as well? – arviman Dec 06 '14 at 11:20
  • 1
    @arviman I haven't tested, but I expect so since those two methods also use regular expressions, which is the source of the performance issues with nextFloat: https://android.googlesource.com/platform/libcore/+/master/luni/src/main/java/java/util/Scanner.java – Ian Newson Dec 06 '14 at 16:17
  • Tested and confirmed to be at least 3x faster for parsing doubles. – Tarik Feb 05 '17 at 08:27
8

Don't know about Android, but at least in JavaSE, Scanner is slow.

Internally, Scanner does UTF-8 conversion, which is useless in a file with floats.

Since all you want to do is read floats from a file, you should go with the java.io package.

The folks on SPOJ struggle with I/O speed. It's is a Polish programming contest site with very hard problems. Their difference is that they accept a wider array of programming languages than other sites, and in many of their problems, the input is so large that if you don't write efficient I/O, your program will burst the time limit.

Of course, I advise against writing your own float parser, but if you need speed, that's still a solution.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
Leonel
  • 28,541
  • 26
  • 76
  • 103
  • 1
    Even if Scanner is slow, 30 minutes for 10,000 floats is nowhere near a reasonable time, even if Scanner did 10 useless charset-conversions. – Joachim Sauer Mar 15 '10 at 12:04
  • 2
    It seems that Scanner is indeed VERY slow on the device/emulator! It might be because of the huge number of memory allocations. On the emulator it takes 30 minutes to read 10,000 floats. On the PC it takes 1 second to read 20,000 floats (with Scanner). As a solution I found the following to work very well: first I parse my input file on the PC and transform it into binary data, then I read it on the device byte by byte (buffered) and reconstruct the numbers. This is MUCH faster. It takes 1.5s to read 20,000 floats. I say it's a enormous improvement from 1 hour :) Thanks for all the help! – Cristian Vrabie Mar 17 '10 at 12:47
2

For the Spotify Challenge they wrote a small java utility for parsing IO faster: http://spc10.contest.scrool.se/doc/javaio The utility is called Kattio.java and uses BufferedReader, StringTokenizer and Integer.parseInt/Double.parseDouble/Long.parseLong to read numerics.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
1

Very Insightful post. Normally when I worked with Java thought Scanner is fastest on PC. The same when I try to use it in AsyncTask on Android, its WORST.

I think Android must come up with alternative to scanner. I was using scanner.nextFloat(); & scanner.nextDouble(); & scanner.nextInt(); all together which made my life sick. After I did my tracing of my app, found that the culprit was sitting hidden.

I did change to Float.parseFloat(scanner.next()); similarly Double.parseDouble(scanner.next()); & Integer.parseInt(scanner.next());, which certainly made my app quite fast I must agree, may be 60% faster.

If anyone have experienced the same, please post here. And I'm too looking out at alternative to Scanner API, any one have bright ideas can come forward and post here on reading file formats.

zIronManBox
  • 4,967
  • 6
  • 19
  • 35
  • It's been a while since when I posted this but it seems that there were no major improvements since then. `Float.parseFloat(scanner.next())` does indeed give you a significant speed boost, but it doesn't come even close to reading numbers directly from a binary format (about 1200% speed increase). So, for large sets of numbers I still recommend to convert. Unless the readability of the file is crucial of course, in which case you can leave it as it is for dev but hook a resource-generation task in your build system, to convert it for production. – Cristian Vrabie Jun 19 '14 at 15:20
  • how can I read it in a binary format? Can you direct me a suitable approach? – zIronManBox Jun 21 '14 at 08:15
  • Check `DataInputStream` or just convert floats to ints and use basic `InputStream`/`OutputStream` `read`/`write`. It's the most efficient way. – Cristian Vrabie Jun 21 '14 at 19:33
0

Yes I'm not seeing anything like this. I can read about 10M floats this way in 4 secs on the desktop, but it just can't be that different.

I'm trying to think of other explanations -- is it perhaps blocking in reading the input stream from getAssets()? I might try reading that resource fully, timing that, then seeing how much additional time is taken to scan.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • It is that different. You forget that even if it's the same Java code, Android has its own implementation of the Runtime Environment. There are different implementations for everything, including basic stuff like charset encoding and memory allocation. Newer implementation are possible better, but this was the case when I posted. – Cristian Vrabie Jun 21 '14 at 19:22
0

Scanner may be part of the problem, but you need to profile your code to know. Alternatives may be faster. Here is a simple benchmark comparing Scanner and StreamTokenizer.

Community
  • 1
  • 1
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
0

I got the exactly same problem. It took 10 minutes to read my 18 KB file. In the end I wrote a desktop application that converts those human readable numbers into machine-readable format, using DataOutputStream.

The result was astonishing.

Btw, when I traced it, most of the Scanner method calls involves regular expressions, whose implementation is provided by com.ibm.icu.** packages (IBM ICU project). It's really overkill.

The same goes for String.format. Avoid it in Android!

Randy Sugianto 'Yuku'
  • 71,383
  • 57
  • 178
  • 228