I'll take a crack at fleshing out how you would accomplish these.
Capitalisation
This is fairly close to Named Entity Recognition and is an example of a 'sequence tagging problem'. Proper nouns should be initially capitalised, organisation names that are acronyms should be all capitalised, and then there are other examples that fall outside those categories. It seems to me that it would therefore be harder than NER and so a straightforward dictionary based approach probably wouldn't be an ideal solution.
If you were to use an Hidden Markov Model, this would amount to letting the 'hidden' state of an HMM be [lowerCase, initCaps, allCaps] and training on some data that you assume is correct (e.g. Wikipedia but there are many other sources too). You then infer the hidden state for words that you aren't sure are correctly capitalised. There are a bunch of HMM libraries out there, I'm sure you can find one to suit your needs. I'd say trying an HMM is a good initial choice.
Non ASCII characters
As you guessed, a tougher problem. If you tried to do this with an HMM at the word level, you would have an enormous number of hidden states, one for each accented word, which would probably be impossible to train. The problem is more tractable at the character level but you lose a tremendous amount of context if you only consider the previous character. If you start using n-grams instead of characters, your scaling problems come back. In short, I don't think this problem is like the previous one because the number of labels is too large to consider it a sequence labelling problem (I mean you can, it's just not practical).
I haven't heard of research in this area, then again I'm no expert. My best guess would be to use a general language model for the language you are interested in. You could use this to give you a probability of a sentence in the language. Then you could try replacing possibly accented characters to give the probabilities of those sentences and take the most likely, or use some threshold on the difference, or something like that. You could train an n-gram language model fairly easily on a large corpus of a certain language.
I have no idea if this would actually work, either in terms of accuracy or efficiency. I don't have direct experience of this particular problem.
Transliteration
No idea, to be honest. I don't know where you would find data to make a system of your own. After a brief search, I found the Google Transliteration service (with API). Perhaps it does what you're after. I don't even have enough experience in languages with other scripts to really know what it's doing.