4

I have a string that looks like this:

{\x22documentReferer\x22:\x22http:\x5C/\x5C/pikabu.ru\x5C/freshitems.php\x22}

How could I convert this into a readable JSON?

I've found different slow solutions like here with regEx

Have already tried:

URL.decode
StringEscapeUtils
JSON.parse // from different libraries 

For example python has simple solution like decode from 'string_escape'

Linked possible duplicate applies to Python, and my question is about Java or Scala

Working but also very slow solution I'm using now is from here:

 def unescape(oldstr: String): String = {
val newstr = new StringBuilder(oldstr.length)
var saw_backslash = false
var i = 0
while (i < oldstr.length) {
  {
    val cp = oldstr.codePointAt(i)
    if (!saw_backslash) {
      if (cp == '\\') saw_backslash = true
      else newstr.append(cp.toChar)
    } else {
      if (cp == '\\') {
        saw_backslash = false
        newstr.append('\\')
        newstr.append('\\')
      } else {
        if (cp == 'x') {
          if (i + 2 > oldstr.length) die("string too short for \\x escape")
          i += 1
          var value = 0
          try
            value = Integer.parseInt(oldstr.substring(i, i + 2), 16)
          catch {
            case nfe: NumberFormatException =>
              die("invalid hex value for \\x escape")
          }
          newstr.append(value.toChar)
          i += 1
        }
        else {
          newstr.append('\\')
          newstr.append(cp.toChar)
        }
        saw_backslash = false
      }
    }
  }
  i += 1
}
    if (saw_backslash) newstr.append('\\')
    newstr.toString
  }

private def die(msg: String) {
  throw new IllegalArgumentException(msg)
}
Artem
  • 1,157
  • 1
  • 14
  • 24
  • Have you tried anything? Looks like **no research** to me! – Prashant Oct 31 '17 at 09:39
  • @Prashant looks like *to you*. I've tried that slow one that is linked in description and others – Artem Oct 31 '17 at 09:43
  • 1
    You linked question for Python and I asked about solution in Java of Scala. Did you read the question? – Artem Oct 31 '17 at 09:54
  • @Prashant explain linked duplicate, please – Artem Oct 31 '17 at 10:09
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/157878/discussion-between-soloveiko-and-prashant). – Artem Oct 31 '17 at 10:19
  • 2
    The possible duplicate linked question asks about Python and OP shows that he or she has tried to use guidance from that answer to no avail. Should be kept open. – Ben Reich Oct 31 '17 at 17:56

1 Answers1

4

\x is used to escape ASCII characters in Python and other languages. In Scala and Java, you can use \u to escape Unicode characters. Since ASCII is a subset of Unicode (as explained here), we can use the unescapeJava method (in StringEscapeUtils) along with some simple replacement to add the \u escape character together with 2 leading zeros:

import org.apache.commons.lang3.StringEscapeUtils
StringEscapeUtils.unescapeJava(x.replaceAll("""\\x""", """\\u00"""))

You can also use regex to find the escape sequences and replace them with the appropriate ASCII character:

val pattern = """\\x([0-9A-F]{2})""".r

pattern.replaceAllIn(x, m => m.group(1) match {
  case "5C" => """\\""" //special case for backslash
  case hex => Integer.parseInt(hex, 16).toChar.toString
})

This appears to be faster and does not require an external library, although it is still may be slow for your needs. It probably also does not cover some edge cases, but might cover simple needs.

I am definitely not an expert on this so there might be a better way to handle this.

Ben Reich
  • 16,222
  • 2
  • 38
  • 59
  • Thanks for your answer! It works, but still slower than that I'm using now. Also replaceAll use regex, so I've tried to use replace() but same slow performance. – Artem Nov 01 '17 at 09:14
  • @Soloveiko I have updated the answer with a simpler solution that is faster, but not as robust. Might give you some new ideas. – Ben Reich Nov 01 '17 at 16:29