0

I am trying to decode a Base64 encoded byte string to a valid HTTP URL. I have tried appending necessary padding (=). But it still does not seem to work.

I have tried the following code.

import base64
encoded = b"aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU="
decoded = base64.b64decode(encoded)
print(decoded)

The string encoded has a missing character as a part of noise. Is there a way to detect that missing character and then perform the decode operation?

Saimon
  • 407
  • 2
  • 11

2 Answers2

2

So, you have this aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU= base64 encoding of an URL with exactly one character missing.

For the missing character, you've 64 choices: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/ (for base64) and 48 possible positions to put the missing character in -a-H-R-0-c-H-M-6-L-y-9-m-b-3-J-t-c-y-5-n-b-G-U-v-W-U-5-Z-X-Q-0-d-2-N-R-W-H-V-L-N-n-N-w-d-j-U-=- (- indicates the possible positions)

So, you've 64 * 48 = 3072 possible encoded strings. Either you can try to generate them by your hand or write some code to do the same.

Once you generate them, you can decode the string to get the URL using some built-in libraries & check whether this URL is valid or not. If you also need to know whether this URL exists or not, you can make an HTTP request to the URL & check the response StatusCode.

Code:

package main

import (
    "encoding/base64"
    "fmt"
    "net/http"
)

func main() {
    encodedURL := "aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU="
    options := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/"

    length := len(encodedURL)

    for i := 0; i <= length; i++ {
        for idx := 0; idx < 64; idx++ {
            tempEncoded := encodedURL[:i] + options[idx:idx+1] + encodedURL[i:]
            decodedURL, _ := base64.URLEncoding.DecodeString(tempEncoded)

            resp, err := http.Get(string(decodedURL))
            if err == nil && resp.StatusCode == http.StatusOK {
                fmt.Println("this URL is valid & exists: ", string(decodedURL))
            }
        }
    }
}
0

when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four.

len(encoded) is 47, it should be 48, So append another =

encoded = b"aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU=="
print(decoded)
b'https://forms.gle/YNY]\r\x1d\xd8\xd4V\x1dR\xcd\x9c\xdc\x1d\x8d'
Sam Daniel
  • 1,800
  • 12
  • 22