1

I need to convert Java Strings to ISO/IEC 8859-1 in order to save space/make every character use 1 byte.

However, when using

getBytes(StandardCharsets.ISO_8859_1)

some characters like š and ž are later printed as ?. They are not part of ISO/IEC 8859-1, but I would like to have an automatic way to adequately replace these letters like: š-->s, ž-->z, for every letter of UTF-16 that is not part of ISO/IEC 8859-1.

Is such a thing possible?

EDIT: I used a solution by Erick Robertson.

https://stackoverflow.com/a/3322174/10197944

noob13
  • 11
  • 3
  • Yes, such a thing is certainly possible. What might _not_ be possible is to find the solution ready-made for you out there in internet-land: you might have to program it for yourself. – Kevin Anderson Sep 24 '18 at 10:08
  • 2
    It should be possible, in fact we're working on a similar problem atm. Something based on [this](https://stackoverflow.com/questions/4122170/java-change-%C3%A1%C3%A9%C5%91%C5%B1%C3%BA-to-aeouu) should work for most cases. There might be cases where this doesn't work (we've experiences difficulties with ligatures such as œ but there shouldn't be too many of those and we're using a mapping table in that case). – Thomas Sep 24 '18 at 10:08
  • Thanks Thomas for the link! – noob13 Sep 24 '18 at 15:52
  • Don't quite agree with "This question already has answers", as the op only wants to remove non-latin1 characters. Posted my answer here: https://stackoverflow.com/questions/11232201/api-or-method-to-replace-all-non-latin-1-characters/69926231#69926231 – Simon Nov 11 '21 at 09:51

1 Answers1

0

There is String.replaceAll() method, however if you want to retain rather precise control over which tokens get replaced with which other ones, it's going to require you to make up the precise list and code all the invocations. "For every token of UTF-16 that is not part of ..." will make that hard to do ... (and on top might take an unwieldy long time to run as well).

A generic method String replacement "do the replacement that I happen to have in mind" has not been prepared for you, alas.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52