0

I have a Russian language dataset in .dta format. Stata displays labels in a wrong way as a bunch of symbols. Seems that the issue is that when the file was created it was encoded as Windows-1251 and Stata uses different encoding to display it.

Please let me know if you have some ideas.

I tried to solve it running

clear all
set more off 
unicode encoding set Windows-1251
unicode translate file_name.dta

And obtain the following r(198) error:

(using Windows-1251 encoding)

File summary (before starting):
    1  file(s) specified
    1  file(s) to be examined ...

File file_name.dta (Stata dataset)
  234 variable names okay, ASCII
    1 variable name okay, already UTF-8
  all data labels okay, ASCII
    0 variable labels okay, ASCII
  144 variable labels okay, already UTF-8
   91 variable labels translated
  r(198);

if I try:

unicode analyze file_name.dta

I also get an r(3300) error:

 91 variable labels need translation
        1 value-label name needs translation
             st_vlload():  3300  argument out of range
     examine_dta_vallab_content():     -  function returned error
     examine_dta_vallabs_content():     -  function returned error
      examine_dta_file():     -  function returned error
          examine_file():     -  function returned error
      do_examine_files():     -  function returned error
            unicode_do():     -  function returned error
       unicode_analyze():     -  function returned error
                 <istmt>:     -  function returned error
      r(3300);
JosefZ
  • 28,460
  • 5
  • 44
  • 83
  • https://www.stata.com/help.cgi?dta#strings says **Strings are encoded UTF-8 in Stata**… Related: [Stata Data File Format (.dta)](https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml) – JosefZ May 09 '23 at 17:47

0 Answers0