0

I am trying to encode a string in PHP using an algorithm similar to Rot13, then decode the string in JavaScript and do a search and replace. It works just fine with ASCII characters but it doesn't work with Unicode.

I have messed around with the code attached, but can't get it to work.

<?php

function strRot($str, $n) {
    $len = mb_strlen($str);
    $min = 0;
    $max = 99999999;
    $final = '';

    for ($i = 0; $i < $len; $i++) {
        $current = mb_ord($str[$i]);
        $val = $current+$n;

        if ($val >= $max) {
            $val = $val - $max;
        }

        if ($val <= $min) {
            $val = $val + $min;
        }

        $final .= mb_chr($val);
    }

    return $final;
}

?><!doctype html>
<html lang="en">
<head>
    <!-- Required meta tags -->
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

    <!-- Bootstrap CSS -->
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/css/bootstrap.min.css" integrity="sha384-GJzZqFGwb1QTTN6wy59ffF1BuGJpLSa9DkKMp0DgiMDm4iYMj70gZWKYbI706tWS" crossorigin="anonymous">

    <title>Hello, world!</title>
</head>
<body>
    <h1>Hello, world!</h1>
    <h2>Ü and . 棕色的狐狸跳了起来.</h2>

    <p>The Hello, world! expression will be replaced.</p>
    <p>Ü and . 棕色的狐狸跳了起来. Should be replaced too.</p>

    <!-- Optional JavaScript -->
    <!-- jQuery first, then Popper.js, then Bootstrap JS -->
    <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js" integrity="sha384-wHAiFfRlMFy6i5SRaxvfOCifBUQy1xHdJ/yoi7FRNXMRBu5WHdZYu1hA6ZOblgut" crossorigin="anonymous"></script>
    <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js" integrity="sha384-B0UglyR+jN6CkvvICOB2joaf5I4l3gm9GU6Hc1og6Ls7i6U/mkkaduKaBhlAXv9k" crossorigin="anonymous"></script>

    <script id="scriptId" type="text/javascript">
        var data = [
            ["Hello, world!", "<?php echo base64_encode(strRot('I got replaced.', 1000)); ?>"],
            ["Ü and . 棕色的狐狸跳了起来.", "<?php echo base64_encode(strRot(' before Ü and 棕色的.', 1000)); ?>"]
        ];

        function b64DecodeUnicode(str) {
            // Going backwards: from bytestream, to percent-encoding, to original string.
            return decodeURIComponent(atob(str).split('').map(function(c) {
                return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
            }).join(''));
        }

        function strRot(str, n)
        {
            var min = 0;
            var max = 99999999;
            var final = '';

            for (var i in str) {
                var current = str.charCodeAt(i);
                var val = current+n;

                if (val >= max) {
                    val = val - max;
                }

                if (val <= min) {
                    val = val + min;
                }

                final += String.fromCharCode(val);
            }

            return final;
        }

        function replace() {
            for (index in data) {
                //var regex = new RegExp(data[index][0], "ug");
                jQuery("html *:not(script[id=scriptId])").children().each(function () {
                    jQuery(this).html(jQuery(this).html().replace(
                        data[index][0],
                        strRot(b64DecodeUnicode(data[index][1]), -1000)
                    ));
                });
            }
        }

        replace();
    </script>

</body>
</html>

Once the JS runs it shoud replace data[index][0] with decoded data[index][1].

Augustin
  • 11
  • 2
  • Why not just base64 encode the strings? – mlewis54 Jun 09 '19 at 15:56
  • @mlewis54 because the purpose of the script is to avoid footprints in content and markup. I need to randomize all data on every domain this plugin is installed. – Augustin Jun 09 '19 at 18:04

2 Answers2

0

(I don't have enough reputation to comment, so am resorting to using an answer...)

Not sure if it makes a difference, but in the HTML "h2" header your Unicode expression is...

Ü an . 棕色的狐狸跳了起来.

...and in data[], it is...

Ü and . 棕色的狐狸跳了起来.

Presume that "an" and "and" should be the same?

Trentium
  • 3,419
  • 2
  • 12
  • 19
  • Thank you for notocing the typo but that's not the problem. I fixed and now it just deletes the unicode characters. The replacement is made up of just the ASCII characters. – Augustin Jun 09 '19 at 18:02
  • @Augustin, in your b64DecodeUnicode function, I notice that your employing ".slice(-2)". Javascript strings as I understand it are stored as UTF16, which means two bytes per character. Plus, there are cases where there can be two UTF16 characters used for one character. (See http://jonisalonen.com/2012/from-utf-16-to-utf-8-in-javascript/). Plus, if your goal is to pass this data back and forth with a server, you might have to be mindful of UTF8 and UTF16 conversions. (See https://stackoverflow.com/questions/51507610/how-to-convert-utf8-arraybuffer-to-utf16-javascript-string). – Trentium Jun 09 '19 at 18:47
0

One solution I found:

var data = [
            ["Hello, world!", "<?php echo base64_encode(strRot(rawurlencode('I got replaced.'), 1000)); ?>"],
            ["Ü and . 棕色的狐狸跳了起来.", "<?php echo base64_encode(strRot(rawurlencode(' before Ü and 棕色的.'), 1000)); ?>"]
        ];

// Then, in replace():

decodeURIComponent(strRot(b64DecodeUnicode(data[index][1]), -1000))

This works because it escapes all unicode characters before rotating them. Only problem is it adds a bit of overhead when it comes to the size of the string because of the escaping.

Augustin
  • 11
  • 2