How can I decode special characters creating using VIM digraphs using Python 3.7?

Question

As part of my studies I want to create flash-cards using online software.

To do this, I use Python 3.7 to parse the files and to generate the fronts and backs of my flash cards (which I then use pyautogui to add to the flash-card software). I want to be able to parse the special characters created using VIM digraphs.

I structure my notes such that every second paragraph is a question, and every second paragraph is an answer, so that it is easily split along paragraphs. I am currently studying Chemistry, and there are many symbols that are very useful to have in such a circumstance, allowing me to write things like:

Vad är den allmänna formeln för syrabas-jämvikter där syran dominerar?

* HA + H₂O ↔ H₃O⁺ + A⁻
* HA ↔ H⁺ + A⁻
* HA = Godtycklig syra
* A⁻ = Syrans konjugerade bas

Hur stor är syrakonstanten (Ionization constant)?

* K_a = {H₃O⁺}{A⁻} / {HA} (Då {H₂O} = 1)

Hur beräknas förändringar i inre energi?

* dU = dq + dw ⇒ ΔU = q + w
  * dU, ΔU = Ändringen i inre energi
  * dq, q = Värme (tillförsel/bortförsel)
  * dw, w = Arbete (tillförsel/bortförsel)

Hur protolyseras flerprotoniga syror?

* H₃A ↔ H⁺ + H₂A⁻ (K_a1)
* H₂A⁻ ↔ H⁺ + HA²⁻ (K_a2)
* HA²⁻ ↔ H⁺ + A³⁻ (K_a3)
* K_a1 > K_a2 > K_a3

This is a tiny sample of the notes I am parsing. When copying these directly from VIM into the browser, it yields the desired symbols, but I want to automate the process of creating the cards.

When I try to parse it with Python, all these special characters disappear, resulting in a lot of manual labor to correct all the small symbols.

I am using pyautogui.typewrite(...) to output the questions. The code I am currently using to parse are along the lines of the following:

file_name = "..." # The file with the notes

with open(file_name, encoding="utf-8") as f:
    paragraphs = iter(f.read().split("\n\n"))

questions = ((question, next(paragraphs)) for question in paragraphs)

Is there a way to input and output the symbols in my notes using Python 3.7?

Cheers!

I dropped the Vim tag; there's nothing special about how they got created (and you could have used any other editor for that). The important thing is the _character encoding_ of the file you store those in. I presume it's UTF-8 (the most common general encoding in use today). You can check in Vim with `:setlocal fileencoding?` — Ingo Karkat, Oct 22 '18 at 16:05
Related: https://stackoverflow.com/questions/33151865/input-unicode-string-with-pyautogui — Patrick Haugh, Oct 22 '18 at 18:04
Using the same workaround (copy-pasting using pyperclip) as in the question shared by @PatrickHaugh resolved the issue. It has the side effect of cluttering the clipboard, but it works! — JRasmusBm, Oct 23 '18 at 05:52

How can I decode special characters creating using VIM digraphs using Python 3.7?

0 Answers0