0

I may be wrong in accessing weather this string is ansi or anything else but it comes from rtf docs with heading.

{\rtf1\ansi\ansicpg1252

the string of interest from doc is:

ansi_string = r'3 \u176? \u177? 0.2\u176? (2\u952?)'

when i open the doc with word it gives me : 3° ± 0.2° 2θ

Questions are: 1) what are these escape codes? is it possible to convert this string to utf-8 using python inbuilt methods?

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Rahul
  • 10,830
  • 4
  • 53
  • 88

1 Answers1

0

I don't think this is the best answer but to make a point what I want, here is the working code.

import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms

def rtf_to_text(rtf_str):
    rtf = r"{\rtf1\ansi\ansicpg1252" + '\n' + rtf_str + '\n' + '}'
    richTextBox = WinForms.RichTextBox()
    richTextBox.Rtf = rtf
    return richTextBox.Text

print(rtf_to_text(r'3 \u176? \u177? 0.2\u176? (2\u952?)'))
-->'3 ° ± 0.2° (2θ)'
Rahul
  • 10,830
  • 4
  • 53
  • 88