Optimizing Project Euler #22

Question

Thanks in advance.

I just solved Project Euler #22, a problem involving reading about 5,000 lines of text out of a file and determining the value of a specific name, based on the sum of that Strings characters, and its position alphabetically.

However, the code takes about 5-10 seconds to run, which is a bit annoying. What is the best way to optimize this code? I'm currently using a Scanner to read the file into a String. Is there another, more efficient way to do this? (I tried using a BufferedReader, but that was even slower)

public static int P22(){


    String s = null;

    try{
        //create a new Scanner to read file
        Scanner in = new Scanner(new File("names.txt"));
        while(in.hasNext()){
            //add the next line to the string
            s+=in.next();
        }

    }catch(Exception e){

    }
    //this just filters out the quotation marks surrounding all the names
    String r = "";
    for(int i = 0;i<s.length();i++){
        if(s.charAt(i) != '"'){
            r += s.charAt(i);
        }
    }
    //splits the string into an array, using the commas separating each name
    String text[] = r.split(",");
    Arrays.sort(text);



    int solution = 0;
    //go through each string in the array, summing its characters
    for(int i = 0;i<text.length;i++){
        int sum = 0;
        String name = text[i];
        for(int j = 0;j<name.length();j++){
            sum += (int)name.charAt(j)-64;
        }
        solution += sum*(i+1);
    }
    return solution;


}

Amadan · Accepted Answer · 2011-12-29T20:36:37.893

5

If you're going to use Scanner, why not use it for what it's supposed to do (tokenisation)?

  Scanner in = new Scanner(new File("names.txt")).useDelimiter("[\",]+");
  ArrayList<String> text = new ArrayList<String>();
  while (in.hasNext()) {
    text.add(in.next());
  }
  Collections.sort(text);

You do not need to strip quotes, or split on commas - Scanner does it all for you.

This snippet, including java startup time, executes in 0.625s (user time) on my machine. I suspect it should be a bit faster than what you were doing.

EDIT OP asked what the string passed to useDelimiter was. It's a regular expression. When you strip out the escaping required by Java to include a quote character into a string, it's [",]+ - and the meaning is:

[...]   character class: match any of these characters, so
[",]    match a quote or a comma
...+    one or more occurence modifier, so
[",]+   match one or more of quotes or commas

Sequences that would match this pattern include:

"
,
,,,,
""",,,",","

and indeed ",", what was what we were going after here.

edited Dec 29 '11 at 20:36

answered Dec 29 '11 at 20:15

Amadan

191,408
23
240
301

Thank You! Just a quick question: the execution time you stated, was that with it reading the file in the problem? – Jack K Dec 29 '11 at 20:29
What is the syntax for the [/ and ]+? – Jack K Dec 29 '11 at 20:31
@JackK: It was for the whole chunk of code I wrote up there - from reading the file down to sorting the collection. I'll edit in an explanation of the pattern in the answer. – Amadan Dec 29 '11 at 20:31
2

i implemented this solution, and now it runs on about .3 seconds! Thanks! – Jack K Dec 29 '11 at 20:54

score 1 · Answer 2 · answered Apr 19 '12 at 09:41

An obtuse solution which may find interesting.

long start = System.nanoTime();
long sum = 0;
int runs = 10000;
for (int r = 0; r < runs; r++) {
    FileChannel channel = new FileInputStream("names.txt").getChannel();
    ByteBuffer bb = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
    TLongArrayList values = new TLongArrayList();

    long wordId = 0;
    int shift = 63;
    while (true) {
        int b = bb.remaining() < 1 ? ',' : bb.get();
        if (b == ',') {
            values.add(wordId);
            wordId = 0;
            shift = 63;
            if (bb.remaining() < 1) break;

        } else if (b >= 'A' && b <= 'Z') {
            shift -= 5;
            long n = b - 'A' + 1;
            wordId = (wordId | (n << shift)) + n;

        } else if (b != '"') {
            throw new AssertionError("Unexpected ch '" + (char) b + "'");
        }
    }

    values.sort();

    sum = 0;
    for (int i = 0; i < values.size(); i++) {
        long wordSum = values.get(i) & ((1 << 8) - 1);
        sum += (i + 1) * wordSum;
    }
}
long time = System.nanoTime() - start;
System.out.printf("%d took %.3f ms%n", sum, time / 1e6);

prints

XXXXXXX took 27.817 ms.

Daniel Fischer · Answer 3 · 2011-12-29T20:18:21.370

1

Appending strings in a loop with '+', like you do here:

/* That's actually not the problem since there is only one line. */
while(in.hasNext()){
    //add the next line to the string
    s+=in.next();
}

is slow, because it has to create a new string and copy everything around in each iteration. Try using a StringBuilder,

StringBuilder sb = new StringBuilder();
while(in.hasNext()){
    sb.append(in.next());
}
s = sb.toString();

But, you shouldn't really read the file contents into a String, you should create a String[] or an ArrayList<String> from the file contents directly,

int names = 5000; // use the correct number of lines in the file!
String[] sa = new String[names];
for(int i = 0; i < names; ++i){
    sa[i] = in.next();
}

However, upon checking, it turns out that the file does not contain about 5000 lines, rather, it is all on a single line, so your big problem is actually

/* This one is the problem! */
String r = "";
for(int i = 0;i<s.length();i++){
    if(s.charAt(i) != '"'){
        r += s.charAt(i);
    }
}

Use a StringBuilder for that. Or, make your Scanner read until the next ',' and read directly into an ArrayList<String> and just remove the double quotes from each single name in the ArrayList.

edited Dec 29 '11 at 20:18

answered Dec 29 '11 at 19:58

Daniel Fischer

181,706
17
308
431

I expect it's the "r += charAt()" line that is slow. – kevin cline Dec 29 '11 at 20:04
Thanks for the prompt response, but I used a StringBuilder in my first try, and that actually increased the execution time! – Jack K Dec 29 '11 at 20:07
@kevincline Yes, I didn't see that at first, taking the line count at face value. – Daniel Fischer Dec 29 '11 at 20:19
@JackK Since there is actually only one line, the `StringBuilder` isn't needed for reading the file, but for removing the quotes if you do it that way. – Daniel Fischer Dec 29 '11 at 20:26
you are correct. the + and += methods internally allocate a StringBuffer and use it's append method. – Jack K Dec 29 '11 at 21:37

score 1 · Answer 4 · answered Dec 29 '11 at 20:09

1

I suggest you to run your code with profiler. It allows you to understand, what part is really slow (IO/computations etc). If IO is slow, check for NIO: http://docs.oracle.com/javase/1.4.2/docs/guide/nio/.

answered Dec 29 '11 at 20:09

dbf

6,399
2
38
65

kevin cline · Answer 5 · 2011-12-29T20:30:13.067

5+ seconds is quite slow for this problem. My entire web application (600 Java classes) compiles in four seconds. The root of your problem is probably the allocation of a new String for every character in the file: r += s.charAt(i)

To really speed this up, you should not use Strings at all. Get the file size, and read the whole thing into a byte array in a single I/O call:

public class Names {
  private byte[] data;
  private class Name implements Comparable<Name> {
    private int start; // index into data
    private int length;
    public Name(int start, int length) { ...; }
    public int compareTo(Name arg0) {
      ...
    }
    public int score() 
  }
  public Names(File file) throws Exception {
    data = new byte[(int) file.length()];
    new FileInputStream(file).read(data, 0, data.length);
  }
  public int score() {
    SortedSet<Name> names = new ...
    for (int i = 0; i < data.length; ++i) {
      // find limits of each name, add to the set
    }
    // Calculate total score...
  }
}

I would suggest to use memory-mapped files - that will be even faster. Also using quicksort (Arrays.sort) on mostly sorted collection may cause n^2 complexity. And I believe that s+='..' will be optimized to use StringBuilder internelly, havent looked at javap thus. — jdevelop, Dec 29 '11 at 21:29

score 1 · Answer 6 · edited May 23 '17 at 10:28

1

Depending on the application, StreamTokenizer is often measurably faster than Scanner. Examples comparing the two may be found here and here.

Addendum: Euler Project 22 includes deriving a kind of checksum of the characters in each token encountered. Rather than traversing the token twice, a custom analyzer could combine the recognition and calculation. The result would be stored in a SortedMap<String, Integer> for later iteration in finding the grand total.

edited May 23 '17 at 10:28

Community

1
1

answered Dec 29 '11 at 22:45

trashgod

203,806
29
246
1,045

Thanks! I looked through the javadocs; which method is equivalent to the useDelimeter of the Scanner? is it quotechar()? – Jack K Dec 29 '11 at 23:20
I've not tested it on quoted text, but `quoteChar()` looks right. – trashgod Dec 30 '11 at 00:50

Optimizing Project Euler #22

6 Answers6