-3

I want to be able to encode any string into a valid java class name and then decode that class name back into the provided string. I want to be able to do this is a lossless manner, i.e., no two strings can be encoded to the same java class name.

Is this possible?

Ogen
  • 6,499
  • 7
  • 58
  • 124
  • Can two equal strings be converted to equal class names? What is the purpose of this? What do you mean by lossless? – Samuel French Dec 14 '16 at 23:00
  • 1
    @SamuelFrench Yes because they are the same strings. Also, if you don't know what lossless is, I don't think you're fit to answer this question. – Ogen Dec 14 '16 at 23:00
  • You are (probably) looking for a bidirectional map. See here for more info: http://stackoverflow.com/questions/9783020/bidirectional-map Also, lossless implies compression which you don't specify at all and it is most certainly not needed here. – mascoj Dec 14 '16 at 23:07
  • @mascoj Lossless is applicable here. For example, an encoding/decoding algorithm where I just use the ascii value of every character would obviously be lossy not lossless. I need a lossless algorithm. – Ogen Dec 14 '16 at 23:10
  • Sounds like an XY problem to me. Why a class name? Why a Java class name? Can it include a package? If so, why? If not, why not? – user207421 Dec 14 '16 at 23:14
  • @EJP You don't need to know why a class name. That's not part of the question. My question is very clear, these are my requirements. – Ogen Dec 14 '16 at 23:17
  • 1
    @Ogen What? I don't think you understand what lossless means..... Lossless is only valid when talking about compression, if you have a 1-to-1 mapping then you can't have loss so talking about it is pointless. If you are talking about an encoding that encodes any string to a java class name via lossless compression, then no it doesn't exist based on David's answer. – mascoj Dec 14 '16 at 23:17
  • @mascoj okie dokie – Ogen Dec 14 '16 at 23:18
  • And the answer is very clear too. Given the precise statement of requirements you have given - No it is not possible. – Stephen C Dec 15 '16 at 03:06

2 Answers2

5

The answer to this is clearly no. There are only a finite number of possible Java strings, and not all of them are valid class names. Therefore, you're asking for a bijection between two sets of unequal cardinalities - which naturally doesn't exist.

Dawood ibn Kareem
  • 77,785
  • 15
  • 98
  • 110
  • @Slaks appears to disagree – Ogen Dec 14 '16 at 23:11
  • 3
    That's HIS problem, not mine. I have given you a mathematically valid proof that the answer is "no". There's no need for me to ALSO point out the flaw in @SLaks' answer. In any case, his answer may or may not be "good enough" for your purposes - but that's something that I am unable to tell. – Dawood ibn Kareem Dec 14 '16 at 23:13
  • I am still struggling to understand your answer. For example, if we take the binary system which is based on the set {0, 1} with cardinality 2. Any word made from the set of letters with cardinality 26 can be uniquely represented in binary because the 0 and 1 can be used any number of times. Isn't this a contradiction to your answer? Because now we have two sets, with differing cardinalities, where we have a bijection between them no? – Ogen Dec 14 '16 at 23:22
  • There is a limit on the length of a String in Java, so there are plenty of Strings that are valid, but will become "too long" if you convert them to binary. So no, your example is not a counterexample to my proof. – Dawood ibn Kareem Dec 14 '16 at 23:25
  • Ah, I was confused, perhaps you should make it clear that there is a length limit on Java strings in your answer? – Max Dec 14 '16 at 23:26
  • See http://stackoverflow.com/q/1179983 – Dawood ibn Kareem Dec 14 '16 at 23:26
  • Yes, I realised that instantly and deleted the comment. Sorry about that. But I'd still suggest making it clear that there is a tenth limit in your answer. – Max Dec 14 '16 at 23:27
  • @SLaks' answer is a very good answer that will probably be useful to you. The flaw in it is that if you have a very long string - close to the maximum possible length - with a few characters that need to be escaped, then the act of escaping them may make the resulting string too long. If you're not dealing with very long strings, this may not be an issue for you. So the answer is _technically_ wrong, but _probably_ useful, which is why I haven't downvoted it. – Dawood ibn Kareem Dec 14 '16 at 23:28
  • In fact, the practical limit would be 65535 character ... or less ... strings. http://stackoverflow.com/a/1039066/139985 – Stephen C Dec 15 '16 at 02:58
4

This is certainly possible.

In any situation where you need to convert arbitrary strings to use a limited set of characters, you simply need to invent an escape sequence.

For example, pick _ as your escape character, then replace any invalid character, or any underscore, in the source string with an underscore followed by 8 hex digits of the character's Unicode codepoint.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Ok but what about the situation with the two following inputs: `my$String` being converted into `my_12345678String`. What happens when I try to encode another input called `my_12345678String`? Won't I get the same encoded value ? – Ogen Dec 14 '16 at 23:03
  • That's why you also need to encode the underscores. – SLaks Dec 14 '16 at 23:13
  • So then what's the point of converting invalid characters to underscores in the first place if you're just going to encode them? On the contrary, wouldn't converting them all to underscores be a bad thing because all invalid characters will be encoded to the same thing? – Ogen Dec 14 '16 at 23:16
  • No; I mean encode underscores that appear in the source string, to avoid exactly that problem. – SLaks Dec 14 '16 at 23:17
  • So this is a two step solution? 1. Encode underscores to their unicode representation. 2. Replace any invalid characters with an underscore and their unicode representation. Am I correct in saying this? – Ogen Dec 14 '16 at 23:32
  • Just count underscore as an invalid character and you have only 1 step. – SLaks Dec 14 '16 at 23:44
  • What if my input strings contains 8 hex digits. Won't that mess up the decoding stage? – Ogen Dec 14 '16 at 23:52
  • That's why you need to add an underscore, just like other invalid characters. – SLaks Dec 15 '16 at 00:06
  • But digits are allowed in java class names. So that means I may as well convert every single character into it's unicode code point. – Ogen Dec 15 '16 at 00:08
  • 1
    The flaw in this approach is that it only works if the >>encoded<< string is 65535 characters or less. (This link explains that there is an implementation limit on the length of class names if you are using a spec compliant JVM - http://stackoverflow.com/a/1039066/139985) Thus, it will not work for >>arbitrary<< strings. – Stephen C Dec 15 '16 at 03:01