0

I have already created a table in SSMS v18 and loaded data from a txt file. The column headers are in English, and the values are a mixture of English text, numbers and Russian text. The English text and numbers display fine, but the Russian text displays as ??????. I will need to load files into this table on a monthly basis. How do I get the Russian text to display in Russian and not ????????

Thanks

  • 1
    What are encoding (collation) settings in your DB? – dimirsen Z Nov 27 '21 at 13:32
  • How do I find that out? – David Goldstone Nov 27 '21 at 13:33
  • 1
    This might be helpful: https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928. Your text file is probably UTF-8 encoded and your DB is probably using ASCII. – tromgy Nov 27 '21 at 13:35
  • I followed the article above with this code and I get an error saying: The definition for column 'VARCHAR' must include a data type. CREATE TABLE SalesBirchPlyNew (VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8); INSERT INTO SalesBirchPlyNew SELECT * FROM dbo.SalesBirchPly; DROP TABLE dbo.SalesBirchPly; EXEC sp_rename 'dbo.SalesBirchPlyNew', 'dbo.SalesBirchPly'; – David Goldstone Nov 27 '21 at 14:05
  • what if i save the csv as UTF-8 and then create a new table in the db? – David Goldstone Nov 27 '21 at 14:13
  • you might try TCODE utility to make a guess at Russian language charset http://alexboiko.narod.ru/prod.html however the better option would to make some contract, specification about charset used for export with whichever company is genereting those txt files for you. Any guesswork can make wrong results (like Notepad++ ofte make wrong guesses about Russian texts), it should be their responsibility to generate data in the charset you both agreed upon and fixed in the contract/specs. Notice that XML files even make specification of charset mandatory. – Arioch 'The Nov 27 '21 at 14:31
  • I am going to take a different direction. I am going to create a new database so it handle English and Russian text. Pleasae can anyone advise which collation to use in Azure? – David Goldstone Nov 27 '21 at 14:51
  • @DavidGoldstone While Azure SQL does support UTF-8 (as does on-prem SQL Server 2019) it involves a lot of hoops to jump through to get working ( https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928 ) - just use `nvarchar` and you'll be fine. – Dai Nov 27 '21 at 15:28
  • How did you import the data into your database? Also, you need to show us your `CREATE TABLE` statements. – Dai Nov 27 '21 at 15:28
  • @Dai thanks. I didnt create the table with a CREATE TABLE statement. I used TASKS in SSMS, and imported the csv which is saved as a UTF-8. Should I recreate the table using a CREATE TABLE statement and define the datatypes? if so, what datatype? The csv is made up of English column headers and the values are a mixture of Russian text, English text and numbers. – David Goldstone Nov 27 '21 at 15:56
  • @DavidGoldstone SSMS has at least 3 or 4 completely different ways to import CSV files into SQL Server - you didn't say exactly which one you used. You say the file is saved as UTF-8, but the columns were defined as `varchar`, not `nvarchar`, so the file was _not_ actually read as UTF-8 (probably as `Windows-1252` instead). You'll need to reimport the data and ensure whichever CSV import wizard or process you use uses `nvarchar` for the columns. – Dai Nov 27 '21 at 16:02
  • Thanks @dai So If I convert ALL columns to nvarchar first, then truncate the table, then reimport the csv using tasks ---> import data, this should work? Or shall I create the table again using a create table statement and define the nvarchar datatypes, then import, OR, shall I recreate the database using a different collation? – David Goldstone Nov 27 '21 at 16:08
  • @DavidGoldstone There is no point in changing the column types and then truncating the table, just `DROP TABLE` and recreate it from scratch. Also, the `COLLATION` alone won't fix anything (because `COLLATION` only controls sorting and textual comparisons, not storage and encoding... _unless_ you're using SQL Server 2019 with UTF-8... and then there's more hoops to jump through, but you aren't, so it's off-topic). But if the text is primarily Russian then use `Cyrillic_General_CI_AS`. – Dai Nov 27 '21 at 16:12
  • @dai got it. when i recreate the table using the Cyrillic_General_CI_AS as the collation. then I can import the csv which is using utf-8 ? – David Goldstone Nov 27 '21 at 16:14
  • @DavidGoldstone You still need to ensure the columns are using `nvarchar` and **not** `varchar`. – Dai Nov 27 '21 at 16:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239633/discussion-between-david-goldstone-and-dai). – David Goldstone Nov 27 '21 at 16:15
  • @DavidGoldstone No, sorry. I don't do chats unless you agree to my extortionate hourly consulting rate. – Dai Nov 27 '21 at 16:30
  • Thank you so much for your help. I will do this tomorrow. Where do I assign the Cyrillic_General_CI_AS collation? do i need to recreate the db ? – David Goldstone Nov 27 '21 at 17:55
  • Does this answer your question? [What is the meaning of the prefix N in T-SQL statements and when should I use it?](https://stackoverflow.com/questions/10025032/what-is-the-meaning-of-the-prefix-n-in-t-sql-statements-and-when-should-i-use-it) – Charlieface Nov 27 '21 at 22:29

0 Answers0