1

As a learning exercise I 'm working on a URL shortener that takes a URL and shortens it, similar to https://tinyurl.com/

This code:

import java.util.HashMap;
import java.util.Random;

/*
 * URL Shortener
 *
 * https://gist.github.com/rakeshsingh/64918583972dd5a08012
 *
 */
public class URLShortener {

    // storage for generated keys
    private HashMap<String, String> keyMap; // key-url map

    private char myChars[]; // This array is used for character to number
    // mapping
    private Random myRand; // Random object used to generate random integers
    private int keyLength;

    // Constructor which enables you to define tiny URL key length and base URL
    // name
    public URLShortener(int keyLength) {
        this.setup();
        this.keyLength = keyLength;
    }

    private void setup(){
        keyMap = new HashMap<String, String>();
        myRand = new Random();
        myChars = new char[62];
        for (int i = 0; i < 62; i++) {
            int j = 0;
            if (i < 10) {
                j = i + 48;
            } else if (i > 9 && i <= 35) {
                j = i + 55;
            } else {
                j = i + 61;
            }
            myChars[i] = (char) j;
        }
    }

    // shortenURL
    // the public method which can be called to shorten a given URL
    public String shortenURL(String longURL) {
        String key = "";

        for (int i = 0; i <= keyLength; i++) {
            key += myChars[myRand.nextInt(62)];
        }

        return key;
    }

    // generateKey
    //Why is upper bound 62?
    private String generateKey() {
        String key = "";
        boolean flag = true;
        for (int i = 0; i <= keyLength; i++) {
            key += myChars[myRand.nextInt(62)];
        }
        return key;
    }

    // test the code
    public static void main(String args[]) {
        URLShortener u = new URLShortener(5);
        String urls[] = { "www.google.com/", "www.google.com",
                "http://www.yahoo.com", "www.yahoo.com/",
                "www.icicibank.com"
                , "www.icicibank.com"};

        for (int i = 0; i < urls.length; i++) {
            System.out.println("URL:" + urls[i] + "\tTiny: "
                    + u.shortenURL(urls[i]));
        }
    }
}

prints:

URL:www.google.com/ Tiny: wx4oPv
URL:www.google.com  Tiny: S2JT01
URL:http://www.yahoo.com    Tiny: jFOksR
URL:www.yahoo.com/  Tiny: EYcQ8o
URL:www.icicibank.com   Tiny: V5JOn3
URL:www.icicibank.com   Tiny: u4xQzn

The above code is refactored from https://gist.github.com/rakeshsingh/64918583972dd5a08012

I'm trying to understand this code block in particular:

private void setup(){
    keyMap = new HashMap<String, String>();
    myRand = new Random();
    myChars = new char[62];
    for (int i = 0; i < 62; i++) {
        int j = 0;
        if (i < 10) {
            j = i + 48;
        } else if (i > 9 && i <= 35) {
            j = i + 55;
        } else {
            j = i + 61;
        }
        myChars[i] = (char) j;
    }
}

Why iterate over a char array of 62 elements and at location i < 10 add 48 to the current char position, at location (i > 9 && i <= 35) and 55 to the current char position and for all other positions add 61 ?

blue-sky
  • 51,962
  • 152
  • 427
  • 752

2 Answers2

1

You might want to look at the ASCII table. The digit 0 start at index 48, the letter A at 65 (which is 55+10) and the letter a at 97 (which is 61+36). This is just a fancy way to initialize the myChars array with the digits and letters to use.

Progman
  • 16,827
  • 6
  • 33
  • 48
  • thanks, this makes more sense now. Isn't there a risk that a URL shortened name collision will occur? The risk is small, but the risk increases as the keyLength value is decreased? – blue-sky Aug 29 '20 at 21:15
  • @blue-sky Yes, you can have collisions/duplicates. But you can have collisions/duplicates almost everywhere. – Progman Aug 29 '20 at 21:18
  • if I execute setup() after each generation of a shortened URL instead of executing setup() once the probability of collisions also reduces. – blue-sky Aug 29 '20 at 21:40
1

Seems like the myChars array is just a map from indices to alphanumeric characters.

26 latin lower case characters plus 26 upper case characters plus 10 digits equals 62. Hence the length of the array.

Now, index 0 to 9 corresponds to a digit from '0' to '9'. But, printing '0' as an integer yields 48. That's why you add 48 to 0 (up to 9) to get the character '0' (up to '9').

Same thing as for the others:

Printing the character 'A' as an integer yields 65, hence you need to add 55 at the index 10 to 35 (remember 35 is 10+26-1, ie all the latin upper case characters).

Similarly, printing the character 'a' as an integer yields 97, hence you need to add 61 at the index 36 to 61 (remember 61 is 36+26-1, ie all the latin lower case characters).

You may take a look also at: In what encoding is a Java char stored in?

gthanop
  • 3,035
  • 2
  • 10
  • 27
  • if i < 10 add 48, but why not add 50,51 or some other number ? Is 48 an arbitrary choice ? – blue-sky Aug 29 '20 at 21:47
  • 1
    If you print 48 as a `char` value (ie for example `System.out.println((char) 48);`) you will get the character `'0'` to get printed and that's because according to the encoding of the `char`s in Java (ie UTF16) the characters `'0'` to `'9'` (ie all the digits) are encoded from number 48 up to 57 (both inclusive). The only arbitary choice was probably to map the digit characters (ie `'0'` up to `'9'`) to the first 10 indices of the `myChars` array. – gthanop Aug 29 '20 at 21:58
  • Note that everything in memory is encoded to bits, so to print the character `'A'` for example, the UTF16 encoding's creators decided to map it to the binary representation which corresponds to the number 65. Similarly they chose to map the digit character `'0'` to 48. And so on. It happens also that all ASCII table's characters (all 128 of them) are mapped to the first 128 values of UTF16 (they coincide) and that is why [Progman prompted you](https://stackoverflow.com/a/63651770) to take a look at ASCII table. Or at least that's what I understand it happens to be. – gthanop Aug 29 '20 at 21:58