The project I'm working on takes xml files and input streams and converts them to pdf's and text. In the unit tests I compare this generated text with a .txt
file that has the expected output.
I'm now facing the issue of these .txt
files not being encoded in UTF-8 and been written without persisting this information (namely umlauts).
I have read few articles on the topic of persisting and encoding .txt
files. Including correcting the encoding, saving and opening files in Visual Studio with encoding, and some more.
I was wondering if there is a text file format
that supports meta information about encoding like xml or html for example does.
I'm looking for a solution that is:
- Easy adaptable to any coworker on the same team
- It being persitant and not depending on me choosing an encoding in an editor
- Does not require any additional exotic program
- Can be read without or only little modification of the
File
class and it's input reading of C# - Does at least support UTF-8 encoding