4

I am currently working on a Website where I have mixed values stored in my db and I'd like to find a solution to detect a string for base64 encryption or not. I've come up with this piece of code so far with help from Abhinav bhardwaj on this post (Detect base64 encoding in PHP?):

function IsBase64($s)
{
    // Check if there are valid base64 characters
    if (!preg_match('/^[a-zA-Z0-9\/\r\n+]*={0,2}$/', $s)) return false;
    // Decode the string in strict mode and check the results
    $decoded = base64_decode($s, true);
    if(false === $decoded) return false;
    // if string returned contains not printable chars
    if (0 < preg_match('/((?![[:graph:]])(?!\s)(?!\p{L}))./', $decoded, $matched)) return false;
    // Encode the string again
    if(base64_encode($decoded) != $s) return false;
    return true;
}

It only works halfway for example values like 1234, 7000, reno and other 4-lettered and digited inputs resolve as true even if they are not... Now my question: Is there any reliable way of doing the base64 detection or do i have to keep a list of unencoded and encoded tables and treat them different?

My Plan would have been to merge the data (which some of it needs decryption and some doesn't) together in the end to one php result object and return it as a JSON text.

Any help on this is much appreciated!

Thank you in advanced!

EDIT : After Yoshi's Answering i want to pin my conclusion on the top for others that are looking for a easy solution to encode/decode specific data:

I think the best way would rather be, to keep the encoded data under a specific key in the db and look out in the query dataset result if this specific key is contained to keep track of the content that needs to be decrypted...

FYI: I have updated my Website to this behaviour and i have to admit it works like a charm!

FyZ1K
  • 82
  • 1
  • 8
  • 1
    I think you're out of luck. The *false positives* you mention, still are valid base64 encodings. You'd need to judge whether the decoded version makes any sense, but that will probably be a never ending story, and ultimately would probably also result in false positives. – Yoshi Oct 26 '17 at 06:37
  • That's what i thought... Seems to me that i need to rethink my data storing logic and seperate encoded from non encoded clearly... Thank you for giving me a concrete answer! – FyZ1K Oct 26 '17 at 06:42
  • Alright guys... finally I have found a bullet proof solution for this problem. Use this below function to check if the string is base64 encoded or not - private function is_base64_encoded($str) { $decoded_str = base64_decode($str); $Str1 = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $decoded_str); if ($Str1!=$decoded_str || $Str1 == '') { return false; } return true; } – bilal Feb 03 '20 at 12:27

3 Answers3

2

I will post Yoshi's comment as the final conclusion:

I think you're out of luck. The false positives you mention, still are valid base64 encodings. You'd need to judge whether the decoded version makes any sense, but that will probably be a never ending story, and ultimately would probably also result in false positives. – Yoshi

Ian
  • 3,806
  • 2
  • 20
  • 23
FyZ1K
  • 82
  • 1
  • 8
1

I found one perfect function which checks a string for valid base64:

Returns Boolean True or False

function is_base64($s) {
    // Check if there are valid base64 characters
    if (!preg_match('/^[a-zA-Z0-9\/\r\n+]*={0,2}$/', $s)) return false;

    // Decode the string in strict mode and check the results
    $decoded = base64_decode($s, true);
    if(false === $decoded) return false;

    // Encode the string again
    if(base64_encode($decoded) != $s) return false;

    return true;
}

Thanks #merlucin, Link

Irshad Khan
  • 5,670
  • 2
  • 44
  • 39
0

base64 encoded string is basically (A-Z), (a-z), (0-9) padded with = to mod 4.So, any combination of letters in multiples of 4 is valid to be base64.

function IsBase64($str) {
    if (strlen($str) % 4 == 0) {
        return true;
    } else {
        return false;
    }
}
Robin Rai
  • 392
  • 3
  • 14
  • This doesn't work. Values with 4 letters or digit are returned as true even if they are not base64 encoded... – FyZ1K Oct 26 '17 at 06:33
  • My previous answer was to straight forward . I have updated my answer. please do try them. – Robin Rai Oct 26 '17 at 06:56
  • This only checks now if an entered string is dividable by 4 and if the rest is 0. And besides that there would also be numeric values which you also have to check by its length so i can not accept this answer as a bulletproof approach. I think the best way would rather be, to keep the encoded data under a specific key in the db and look out in the query dataset result if this specific key is contained to keep track of the content that needs to be decrypted... – FyZ1K Oct 26 '17 at 07:38