CSV import produces encoding problem in Stata

Question

I have a problem regarding the import of a CSV file. The following code produces an encoding problem with the "*" sign (asterisk), even though in the provided data sample of the variable looks fine.

import delimited using `file', case(preserve) stringcol(_all) encoding(utf8) clear

I first tried it without the encoding(utf8) part, thinking that with Stata 16 that is not necessary any more. However, in both cases, with or without , I get the little question marks instead of the asterisk.

Has anybody an idea what could cause the problem and how I can fix it?

Tools which I use in the work flow: Stata 16 Ultraedit (standard encoding on ANSI Latin I)

Without access to the file it is hard to be more detailed, but I wonder if what actually is in the csv is a non-standard asterisks: https://unicode-search.net/unicode-namesearch.pl?term=ASTERISK. How was this csv file generated? Do you know that the file is indeed UTF-8 encodeded? https://stackoverflow.com/questions/37177069/how-to-check-encoding-of-a-csv-file — TheIceBear, Apr 06 '22 at 08:24
Thanks for your comment. I read this thread before and followed the recommended answer to figure out the ecoding- and it is indeed utf-8. The csv is created by an software which I don't have access to. But Basically it creates a math formula like: 2 * 5 = _ And the Asterisk creates then the encoding problem. — InPanic, Apr 06 '22 at 08:44
How common is this problem in the file? As in is the occurrences few enough that you can manually replace when with `*` from you typing it on your keyboard? Also, can you open the CSV file in a text editor and copy part of the CSV and share here? — TheIceBear, Apr 06 '22 at 09:08

CSV import produces encoding problem in Stata

0 Answers0