0

I have a problem regarding the import of a CSV file. The following code produces an encoding problem with the "*" sign (asterisk), even though in the provided data sample of the variable looks fine.

import delimited using `file', case(preserve) stringcol(_all) encoding(utf8) clear

I first tried it without the encoding(utf8) part, thinking that with Stata 16 that is not necessary any more. However, in both cases, with or without , I get the little question marks instead of the asterisk.

Has anybody an idea what could cause the problem and how I can fix it?

Tools which I use in the work flow: Stata 16 Ultraedit (standard encoding on ANSI Latin I)

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
InPanic
  • 155
  • 1
  • 14
  • 1
    Without access to the file it is hard to be more detailed, but I wonder if what actually is in the csv is a non-standard asterisks: https://unicode-search.net/unicode-namesearch.pl?term=ASTERISK. How was this csv file generated? Do you know that the file is indeed UTF-8 encodeded? https://stackoverflow.com/questions/37177069/how-to-check-encoding-of-a-csv-file – TheIceBear Apr 06 '22 at 08:24
  • Thanks for your comment. I read this thread before and followed the recommended answer to figure out the ecoding- and it is indeed utf-8. The csv is created by an software which I don't have access to. But Basically it creates a math formula like: 2 * 5 = _ And the Asterisk creates then the encoding problem. – InPanic Apr 06 '22 at 08:44
  • 1
    How common is this problem in the file? As in is the occurrences few enough that you can manually replace when with `*` from you typing it on your keyboard? Also, can you open the CSV file in a text editor and copy part of the CSV and share here? – TheIceBear Apr 06 '22 at 09:08

0 Answers0