I've scraped Japanese contents from online to conduct content analysis. Now I am preparing the text data, starting with creating term-document matrix. The package I am using to clean and parse things out is "RMeCab". I've been told that this package requires text data to be in ANSI encoding. But my data is in UTF-8 encoding, as is the setting of RMeCab and the global setting within R itself.
Is it necessary that I change the encoding of my text files in order to run RMeCab? In that case, how do I convert the encoding of tens of thousands of separate text files quickly?
I tried encoding conversion websites, which give me some gibberish as an ANSI output. I do not understand the mechanism behind inputting something that looks like a bunch of question marks into RMeCab. If I successfully converted encoding to ANSI and my text data look like a bunch of symbols, would RMeCab still be able to read it as Japanese text?