2

How can I import a CSV file that contains some non-UTF8 characters to MongoDB? I tried a recommended importing code.

mongoimport --db dbname --collection colname --type csv --headerline --file D:/fastfood.xls

Error Message

exception: Invalid UTF8  character detected

I would remove those invalid characters manually, but the size of the data is considerably big.

Tried Google with no success.

PS: mongo -v = 2.4.6

Thanks.

Edit: BTW, I'm on Win7

Community
  • 1
  • 1
Zafar
  • 3,394
  • 4
  • 28
  • 43

3 Answers3

5

In Linux you could use the iconv command as suggested in: How to remove non UTF-8 characters from text file

iconv -f utf8 -t utf8 -c file.txt

I'm not familiar with MongoDB, so I have no insight on how to preserve the invalid characters during import.

Community
  • 1
  • 1
tderensis
  • 90
  • 5
  • is that possible on Win too? if i write 'iconv' on CMD, it is not working. (iconv is not recognized as an internal or external command) – Zafar Oct 09 '13 at 07:43
  • 2
    I think the iconv command can be downloaded for windows here: http://sourceforge.net/projects/gettext/ – tderensis Oct 09 '13 at 21:54
  • How we can remove invalid UTF-8 characters from .csv file in windows? – PAA Sep 16 '15 at 18:02
  • iconv -f UTF-8 -t UTF-8 files_with_non_utf8_chars.csv > out.csv – Lekhnath Jan 19 '16 at 08:05
1

For emacs users: Open CSV file in emacs and change encoding using ‘C-x C-m f’ and choosing utf-8 as the coding system. For more information see ChangingEncodings

Sam
  • 11
  • 1
0

You're trying to import an xls file as a csv file. Save the file as csv first, then try again.

SuperAce99
  • 712
  • 6
  • 13
  • Did you get the desired result (successful import) or the same UTF-8 error? How did you convert the file from xls to csv? – SuperAce99 Oct 09 '13 at 12:29
  • i mean i got the same results - error. i just saved the file as "csv". didn't used any tool or something to convert it. – Zafar Oct 09 '13 at 12:55
  • Assuming it's now a valid csv file (e.g. you can see the data when you open the file something like Notepad++) then you do have some weird unicode problems in there. You'll need to pre-process the file before you can load it into Mongo. I would do this using Python; I'm not aware of a direct way to do it with PowerShell. – SuperAce99 Oct 09 '13 at 14:15
  • Just because it has the .xls suffix doesn't mean it's an excel file. Plenty of apps output csv/tsv and give it that suffix. Look at it in notepad. – tom Oct 09 '13 at 14:16