Reduce runtime for manipulating string multiple times using Hashmap / other methods

Question

The problem statement is that we have a string and we want to rotate each character to its next character i.e a to b, b to c and z to a. Also, the conversion is based on the number x ( x is <= size of string). The number represents the length of the characters to be converted. Let's say the number is 3 and string is stack which means only first 3 characters need to be converted, so output will be tubck. if the number is 5 then the output will be tubdl.

I am running the solution for 100000000 times and generating the variable number randomly. I have solved this problem using 3 approaches mentioned below:

Convert from character to integer, then increment the integer by 1, then convert the integer back to character.
Using the hashmap to get the updated character value in O(1) time.
Using C, but the approach followed is same as of step # 1

The runtime of the hashmap technique (approach 2) is coming to be more. I don't know why?

Approach #1 is

public static void main(String[] args) {
    Long startTime = System.currentTimeMillis();

    String name = "abcdefghijklmnopqrstuvwxyz";
    char[] stringCharArray = name.toCharArray();
    Random random = new Random();
    for (Integer i = 0; i <100000000; i++) {
        {
            for (int j = 0; j < random.nextInt(26) + 1; j++) {
                if (stringCharArray[j] == 'z') {
                    stringCharArray[j] = 'a';
                } else {
                    stringCharArray[j] = (char) (((int) (stringCharArray[j])) + 1);
                }
            }
        }
    }
    Long endtime = System.currentTimeMillis();
    System.out.println(endtime-startTime+" ms");

}

Approach # 2 using hashmap is

public static void main(String[] args) {
    HashMap hashMap = new HashMap();
    hashMap.put('a', 'b');
    hashMap.put('b', 'c');
    hashMap.put('c', 'd');
    hashMap.put('d', 'e');
    hashMap.put('e', 'f');
    hashMap.put('f', 'g');
    hashMap.put('g', 'h');
    hashMap.put('h', 'i');
    hashMap.put('i', 'j');
    hashMap.put('j', 'k');
    hashMap.put('k', 'l');
    hashMap.put('l', 'm');
    hashMap.put('m', 'n');
    hashMap.put('n', 'o');
    hashMap.put('o', 'p');
    hashMap.put('p', 'q');
    hashMap.put('q', 'r');
    hashMap.put('r', 's');
    hashMap.put('s', 't');
    hashMap.put('t', 'u');
    hashMap.put('u', 'v');
    hashMap.put('v', 'w');
    hashMap.put('w', 'x');
    hashMap.put('x', 'y');
    hashMap.put('y', 'z');
    hashMap.put('z', 'a');
    Long startTime = System.currentTimeMillis();
    String name = "abcdefghijklmnopqrstuvwxyz";
    char[] stringCharArray = name.toCharArray();
    Random random = new Random();
    for (Integer i = 0; i <100000000; i++) {
        {
            for (Integer j = 0; j < random.nextInt(26) + 1; j++) {
                    stringCharArray[j] = (char) hashMap.get(stringCharArray[j]);
            }
        }
    }
    Long endtime = System.currentTimeMillis();
    System.out.println(endtime-startTime+" ms");
}

Approach #3

#include <stdio.h>
#include <time.h>
#include <zconf.h>
#include <stdlib.h>

int main() {
    long start = clock();
    char name[] = "abcdefghijklmnopqrstuvwxyz";
    for (int i = 0; i <100000000; i++) {
        {
            for (int j = 0; j < rand() % 25; j++) {
                if (name[j] == 'z') {
                    name[j] = 'a';
                } else {
                    name[j] = (char) (((int) (name[j])) + 1);
                }
            }
        }
    }
    long stop = clock();
    printf("time taken = %ld sec \n",( stop-start)/1000);
}

The runtime of approach 1 is ~ 8150 m, approach 2 is ~ 9700 ms and approach 3 is ~ 5400 ms. I am using MacBook Pro 2.3 GHz Intel Core i5 with 8 GB 2133 MHz LPDDR3.

How do I reduce this runtime, I used stream.parallelize() in java for apprach#2 to make the changes but the runtime was increasing. What am I doing wrong?

The goal is to get the runtime less than approach #3. Is there a way in C to parallelize this and solve in lesser time using pthreads?

Did you check how much time the programme spent in random functions ? Note also that all approaches are O(1) — Damien, Jan 03 '19 at 08:38
A better approach than the `HashMap` approach (which necessarily requires boxing) is to populate an array (or even a `String`): `"bcde...xyza".charAt(name[0] - 'a')`. — Andy Turner, Jan 03 '19 at 08:43
`for (int j = 0; j < random.nextInt(26) + 1; j++)` will call `nextint` upon each iteation. I don't think you want that, so have a line before the for loop, e.g. `int n=random.nextInt(26)+1;` and then `for (int j = 0; j < n; j++)` This also optimizes the code because it eliminates unnecessary function calls. — Paul Ogilvie, Jan 03 '19 at 09:09
Notice than `for (int j = 0; j < rand() % 25; j++)` would call `rand()` at each iteration, you probably want: `const int end = rand() % 25; for (int j = 0; j != end; j++)`. — Jarod42, Jan 03 '19 at 09:10
Instead of using `stringCharArray[j]`, use a pointer variable. This optimizes the code by eliminating indexing instructions. — Paul Ogilvie, Jan 03 '19 at 09:12
@Jarod42, defensive programming requires to use `j < end`, not `j != end`. — Paul Ogilvie, Jan 03 '19 at 09:19
@PaulOgilvie: using `!=` instead of `<` would probably make infinite loop to spot the bug, instead of hiding it. — Jarod42, Jan 03 '19 at 09:24
@Jarod42, see discussion at https://stackoverflow.com/questions/31455629/is-there-a-technical-reason-to-use-instead-of-when-incrementing-by-1-in/31456812#31456812 — Paul Ogilvie, Jan 03 '19 at 09:31

Naghaveer R · Answer 1 · 2019-01-03T09:00:45.473

Define a character array like

char[] nextChar = {'b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','a'};

And you can get next either by directly accessing same index or by difference in ascii of characters

                stringCharArray[j] = nextChar[j];//by index
                stringCharArray[j] = nextChar[stringCharArray[j]-'a'];// using ascii

Complete code is below

    Long startTime = System.currentTimeMillis();

    char[] nextChar = {'b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','a'};
    String name = "abcdefghijklmnopqrstuvwxyz";
    char[] stringCharArray = name.toCharArray();
    Random random = new Random();
    for (int i = 0; i <100000000; i++) {
        {
            for (int j = 0; j < random.nextInt(27); j++) {
                stringCharArray[j] = nextChar[j];//by index
                //stringCharArray[j] = nextChar[stringCharArray[j]-'a'];// using ascii
            }
        }
    }
    Long endtime = System.currentTimeMillis();
    System.out.println(endtime-startTime+" ms");

Thanks a lot. I used this in C. The time reduced from ~5400 to ~5350. Definitely, there is some improvement. — Amarjit Dhillon, Jan 03 '19 at 09:01

Soonts · Accepted Answer · 2019-01-03T09:48:58.357

Here's a version that is 8x faster on my PC than your C example.

#include <immintrin.h>

int main()
{
    const __m256i compareConstant = _mm256_setr_epi8( 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
        13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 );

    long start = clock();
    char name[ 32 ] = "abcdefghijklmnopqrstuvwxyz";
    // __m256i is a C name for AVX register.
    // AVX registers are 32 bytes in size, so your 26 bytes string fits in just a single one.
    // The following line loads your string from memory to that register.
    __m256i n = _mm256_loadu_si256( ( const __m256i* )name );
    for( int i = 0; i < 100000000; i++ )
    {
        // Increment the letters, all 32 of them.
        // `_mm256_set1_epi8` creates a value with all 32 bytes set to the same value.
        // `_mm256_add_epi8` adds one set of 32 signed bytes to another set of 32 signed bytes.
        // It's not a loop i.e. it's very fast, CPUs can actually run 2-3 such instructions per cycle. 
        __m256i n2 = _mm256_add_epi8( n, _mm256_set1_epi8( 1 ) );

        // Wrap any > 'z' letters back to 'a'
        // _mm256_cmpgt_epi8 compares one set of bytes to another set, for `>`.
        // When it's `>` the result byte is set to 0xFF, when it's `<=` the result byte is 0.
        // _mm256_blendv_epi8 combines bytes from 2 registers based on the bytes from the third one.
        // In this case, the third one is the result of the comparison.
        n2 = _mm256_blendv_epi8( n2, _mm256_set1_epi8( 'a' ), _mm256_cmpgt_epi8( n2, _mm256_set1_epi8( 'z' ) ) );

        // Combine incremented ones with old, using random number of first characters
        const int r = rand() % 25;
        // This sets all 32 bytes in rv to the random number r
        __m256i rv = _mm256_broadcastb_epi8( _mm_cvtsi32_si128( r ) );
        // Compares all bytes in rv with the constant value [0, 1, 2, 3, ...]
        // For bytes where r>cc, blendv_epi8 will select a byte from n2.
        // For bytes where r<=cc, n will not change because blendv_epi8 will select an old value.
        n = _mm256_blendv_epi8( n, n2, _mm256_cmpgt_epi8( rv, compareConstant ) );
    }
    long stop = clock();
    printf( "time taken = %ld sec \n", ( stop - start ) / 1000 );
}

Can you please explain what you are doing and where these instructions come from (compiler, library,...)? — Paul Ogilvie, Jan 03 '19 at 09:17
@PaulOgilvie Re-implemented OP's code with AVX2 SIMD. The instructions are built-in in CPUs made after 2013, and are supported by all modern C and C++ compilers. As you see, performance difference is huge. — Soonts, Jan 03 '19 at 09:22
Thanks a lot for this code. I think I need a guide to understand this. U seem to be very well versed coder in C. — Amarjit Dhillon, Jan 03 '19 at 09:23
Notice that you call `rand()` only once by loop which should also help a lot :-) — Jarod42, Jan 03 '19 at 09:28
@AmarjitSingh Added more comments, hope it's simpler to understand now. — Soonts, Jan 03 '19 at 09:56
@Jarod42 Indeed, but still this code is 3x faster than scalar C code with a single rand() per iteration. — Soonts, Jan 03 '19 at 09:58
@soonts : That is so awesome. I got errors like always_inline function '_mm256_setr_epi8' requires target feature 'xsave', but would be inlined into function 'main' that is compiled without support for 'xsave'. Also, for avx2. M solving them. Thanks a lot again for additional comments. — Amarjit Dhillon, Jan 03 '19 at 10:00
@soonts: Compiling using `-mavx2` flag in `gcc` solved the issue in MacBook. Otherwise, settings for Xcode needs to be changed. You can add this is the original solution you posted. — Amarjit Dhillon, Jan 03 '19 at 10:07

Reduce runtime for manipulating string multiple times using Hashmap / other methods

2 Answers2