Step #1 is very difficult to accomplish if the file is not already using a UTF encoding, like UTF-8 or UTF-16 (UTF-8 is very easy to detect, and UTF-16 is also fairly easy to some extent, if a BOM is not present).
There are MANY encodings used in the world (Unicode was designed to replace them all, but that goal has not been achieved 100% globally yet), and many non-ASCII encodings cannot accurately be detected without context, or prior knowledge of the encoding that was used to create the file. Unless you can ask the user for the specific encoding, you will have to resort to heuristic analysis of the data (and there are some 3rd party charset detection libraries if you search around), and that is error-prone without context information.
See this:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Without context, the same data can be interpreted in different ways, producing different results. For example, such an issue affects something as "simple" as Notepad in Windows when a file's encoding has to be guessed. This is a good example of how guessing can go wrong:
Notepad bug? Encoding issue?
Some files come up strange in Notepad
The Notepad file encoding problem, redux
Bush hid the facts
No matter how good your heuristics may be, you are still guessing, and guessing is not 100% reliable. So do yourself a favor and don't guess at all.
As for Step #2, once you have determined a source encoding, you should use a portable Unicode library for converting from that encoding to UTF-8, such as libiconv or ICU.