1

So I am trying to read an escaped character from a file, It is a long and complicated process due to a lot of cleansing but that is all irrelevant. The end product is this property of an object -

props.inputSeperator: String type

Now this is a STRING. However, the value of this string in this specific case is \u0001

When I print this, the output is \u0001. And the length of the string props.inputSeperator is 6. How do I convert this string, into a string of a single character? Which would be the special character represented by \u0001 So the length of the string would be 1, and when printed, would print a single special character (\u0001)

val x: String = "\u0001"
val s = Array("\\", "u", "0", "0", "0", "1").mkString("")
println(x) //prints "?"   this is a SINGLE special character
println(s) //prints "\u0001"

I want to take s, and make it into the value of x essentially.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
test acc
  • 561
  • 2
  • 11
  • 24
  • Didn’t get your question. Please give an input example string and what is the output – Chandan Ray Sep 12 '18 at 20:32
  • @ChandanRay I have a string value of `\u0001` which has a length of 6(for some reason, it is not being stored as a single character, which it should be). I want to convert this string to a single character, which should be the special escaped character `\u0001`. Does that make sense? – test acc Sep 12 '18 at 20:34
  • @ChandanRay Please note, if you do `val x: String = "\u0001"` it will correctly store as a single character, however, the way I am reading values into string, this is not the case. And this part of the program cannot be changed. We have to take the string value` x = "\u0001"` as a 6 character string, and convert it to the correct 1 character string. – test acc Sep 12 '18 at 20:36
  • @ChandanRay Here is a testable example `val s = Array("\\", "u", "0", "0", "0", "1").mkString("")` I want `s` to be converted to a single character. – test acc Sep 12 '18 at 20:42

4 Answers4

3

Just use the method unescapeJava from commons.text.StringEscapeUtils:

libraryDependencies += "org.apache.commons" % "commons-text" % "1.4"

Example:

println(org.apache.commons.text.StringEscapeUtils.unescapeJava("\\u046C"))

prints:

Ѭ
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
3

Strip the unwanted characters, parse the hex string, turn into Char.

Integer.parseInt("\\u0A6E".drop(2), 16).toChar
res0: Char = ੮
jwvh
  • 50,871
  • 7
  • 38
  • 64
  • No dependencies is better than some dependencies in this case. – erip Sep 12 '18 at 22:23
  • @erip If you can absolutely promise that there is just *this one* single isolated encoding problem with this particular string that contains a single character... Maybe. Unfortunately, encoding problems usually do not come alone... – Andrey Tyukin Sep 12 '18 at 23:54
  • Sure. I'm a linguist -- I understand encoding issues. :^) My point is, given the strange requirements OP has presented, this is a cute and much less invasive solution. – erip Sep 13 '18 at 12:19
0

You have the UNICODE value in ascii literals. To get the unicode value, you need to just ignore the "\" and "u" and read the rest of the string as hex values using sliding(2,2) format. Then pass the resulting string to a "new String", by specifying the encoding that you need i.e UNICODE.

scala> val ar = Array("\\", "u", "0", "0", "0", "1").mkString("")
ar: String = \u0001

scala> val x = new String( ar.drop(2).sliding(2,2).toArray.map(Integer.parseInt(_, 16).toByte) , "UNICODE")
x: String = ?

scala>  x.length
res53: Int = 1

scala>  x.toArray.map(_.toByte)
res54: Array[Byte] = Array(1)

scala>

Verification:

scala> val x1: String = "\u0001"
x1: String = ?

scala> x==x1
res55: Boolean = true

scala>
stack0114106
  • 8,534
  • 3
  • 13
  • 38
0

val delim :Byte = "\u0007".codePointAt(0).toByte

We can use codePointAt() method then use toByte

  • 2
    Your answer uses a string of 1 character as input which is what the OP wants as output. This doesn't answer the question ;) – Gaël J Aug 26 '21 at 12:43