3

I have a list of strings, I want to place '_Octa', '_Tet' and so on in subscripts

original = ['VO₆_Octa', 'FeO₄_Tet', 'FeO₆_Oct', 'BaO₉_Tsf', 'PrO₆_Oct', 'CaO₆_Oct',
       'HgO₂_Lin', 'CrO₆_Oct', 'AgO₄_Tet', 'EuO₉_Tsf']

What I want is posted in the screenshot

enter image description here

I have hundreds of such strings in a list. For numbers, I have found many such answers and I am able to apply in my case as well. Is there a better way to do it for such strings? Any help or pointers to similar problems would be great.

nucsit026
  • 652
  • 7
  • 16
hemanta
  • 1,405
  • 2
  • 13
  • 23
  • replace with what ? – Sina Farhadi Jan 18 '20 at 07:07
  • Just need to place the string after '_' as the subscript, please see the attached screenshot as the reference. – hemanta Jan 18 '20 at 07:09
  • you question is not clear, the original list and the screenshot are same, please clearify that whats your fist list and what you want to become – Sina Farhadi Jan 18 '20 at 07:16
  • it's not the same... he wants Octa, ... in subscript. – Boendal Jan 18 '20 at 07:17
  • Yes, @Boendal is right. Sina might get confused at first sight. – hemanta Jan 18 '20 at 07:18
  • You can see here: https://stackoverflow.com/questions/8651361/how-do-you-print-superscript-in-python there is a table, but not every character is available. If you want to print it most use mark ups so I think you out of luck with a few characters. – Boendal Jan 18 '20 at 07:19
  • Thanks Boendal, I went through that explanation before I post the question. I will go through the details again, with your suggestion. – hemanta Jan 18 '20 at 07:46
  • If you want rich formatting options, you're going to need a typesetting system like LaTeX or something else that goes beyond simply representing sequences of Unicode code points. Strings just represent sequences of Unicode code points; they're not designed to handle presentation issues like this. – user2357112 Jan 18 '20 at 08:30

1 Answers1

0

Use these!

SUB = 
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_", 
"₀₁₂₃₄₅₆₇₈₉ₐᵦₑfgₕᵢⱼₖₗₘₙₒₚqᵣₛₜᵤᵥwₓyz₋")


SUP = 
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_", 
"⁰¹²³⁴⁵⁶⁷⁸⁹ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻ‾")

Here's the code:

original = ['VO₆_Octa', 'FeO₄_Tet', 'FeO₆_Oct', 'BaO₉_Tsf', 'PrO₆_Oct', 'CaO₆_Oct',
   'HgO₂_Lin', 'CrO₆_Oct', 'AgO₄_Tet', 'EuO₉_Tsf']

SUB = 
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_", 
"₀₁₂₃₄₅₆₇₈₉ₐᵦₑfgₕᵢⱼₖₗₘₙₒₚqᵣₛₜᵤᵥwₓyz₋")


SUP = 
str.maketrans("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_", 
"⁰¹²³⁴⁵⁶⁷⁸⁹ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖᵠʳˢᵗᵘᵛʷˣʸᶻ‾")

new = []

for item in original:
    x = item.split('_')
    new.append(x[0] + "₋" + x[1].translate(SUB))

print(new) 

As you might have noticed, some letters don't actually convert properly to lowercase. This is because the alphabets for subscript and superscript don't actually exist as a proper alphabet in Unicode. I've used various online converters and could only get the conversions of the letters that you see above (ie: excluding lowercase b,c,d,f,g,q,w,y,z).

However in my opinion, the better way to do this would be to format the string in some markup language (HTML, Latex etc). You'll have to use simple <sub></sub> and <sup></sup> tags in HTML.

Suyash
  • 375
  • 6
  • 18
  • I will try your suggestion. Thanks. – hemanta Jan 18 '20 at 07:48
  • @hemanta May I know the reason you(or anyone else) downvoted the answer? – Suyash Jan 18 '20 at 08:29
  • While these substitutions will look kind of vaguely like subscript or superscript versions of the "ordinary" glyphs, the result will be a confusing mess. It'll be hard to read, it'll lose case distinctions present in the original text, and using it will give readers the impression that you have no idea what you're doing. – user2357112 Jan 18 '20 at 08:35
  • @user2357112supportsMonica I understand that. But after over an hour of looking around the web, that's the best I could find. – Suyash Jan 18 '20 at 08:38
  • @user2357112supportsMonica And that's why people use Latex/ HTML or any other markup languages... – Suyash Jan 18 '20 at 08:39
  • @Suyash, I did not downvote the answer. I am going to try it first and let you know. – hemanta Jan 19 '20 at 03:54
  • Hi Suyash, could you please show me a simple example that you run. When I run these and print SUB and SUP, I got the dictionary with keys and values both numbers. – hemanta Jan 19 '20 at 17:55
  • As you mentioned earlier, some of the characters are missing in the output. Octa appears like this 'ₜₐ'. Is it the same in your case? I am trying to find a way to represent each letter in such string by using Unicode literal and then join them finally. If you have any suggestions, please let me know. Thanks again. – hemanta Jan 21 '20 at 18:18
  • @hemanta yes, It's the same for me. And you can't represent them using unicode as the alphabets for subscript and superscript don't actually exist as a proper alphabet in Unicode! – Suyash Jan 22 '20 at 03:47
  • Thank you. I am not much familiar with Unicode and HTML, it will take some time to identify a proper way to get this done then. – hemanta Jan 22 '20 at 03:49
  • @hemanta As I've mentioned, the best way would be to format them some markup language – Suyash Jan 22 '20 at 03:54