Searching Persian characters and words in SQL server with different encoding

Question

I have a text file that contains Persian words and is saved using ANSI encoding. When I try to read the Persian words from the text file, I get some characters like '?'. To solve the problem, I changed the file encoding to UTF8 and re-wrote the text file. Here's the method for changing file encoding:

    public void Convert2UTF8(string filePath)
    {
        //first, read the text file with "ANSI" endocing
        StreamReader fileStream = new StreamReader(filePath, Encoding.Default);
        string fileContent = fileStream.ReadToEnd();
        fileStream.Close();
        //Now change the file encoding and replace it with the UTF8
        StreamWriter utf8Writer = new StreamWriter(filePath.Replace(".txt", ".txt"), false, Encoding.UTF8);
        utf8Writer.Write(fileContent);
        utf8Writer.Close();
    }

Now the first problem is solved; However, there is another issue here: every time that I want to search a Persian word from the SQL server database table, the result is null while the record does exist in the database table.

What's the solution to find my Persian words that exist in the table? The code that I currently use is simply like the following:

SELECT * FROM [dbo].[WordDirectory] 
WHERE Word = N'کلمه'

Word is the field that Persian words are saved in. The type of the field is NVARCHAR. My SQL server version is 2012. Should I change the collation?

M.Ali · Answer 1 · 2014-02-23T20:54:03.603

6

DECLARE @Table TABLE(Field NVARCHAR(4000) COLLATE Frisian_100_CI_AI)

INSERT INTO @Table (Field) VALUES
(N'همهٔ افراد بش'),
(N'می‌آیند و حیثیت '),
(N'ميشه آهسته تر صحبت کنيد؟'),
(N'روح'),
(N' رفتار')   

SELECT * FROM @Table
WHERE Field LIKE N'%آهسته%'

The both Queries return the same result

RESULT Set:  ميشه آهسته تر صحبت کنيد؟

You have to make sure that when you are inserting the values you prefix then witn N thats to tell sql server there can be unicode character in the passed string. Same is true when you are searching for them strings in Select statement.

edited Feb 23 '14 at 20:54

answered Feb 22 '14 at 17:21

M.Ali

67,945
13
101
127

I fully am familiar with the usage of N and inserting Persian characters into the database. I should have mentioned in the question that I used SqlBulkCopy because I had over 9 million records to insert. Is there something in SqlBulkCopy to make it clear that the type of the field in sql table is NVARCHAR? – Pedram Feb 22 '14 at 17:57
Try using the Like Predicate with `%` see what you get then – M.Ali Feb 23 '14 at 20:33
I tried all kinds of `Like` with using `%` and the result is still nothing! I can upload the minimized version of the database if you want to try by yourself! – Pedram Feb 23 '14 at 20:34
@Pedram Try using a Farsi Collation see my update. it works fine here. – M.Ali Feb 23 '14 at 20:54
by changing the Collation, now it works for the case that you mentioned. but for example for the case `SELECT * FROM @Table WHERE FIELD LIKE N'آهسته'` or `SELECT * FROM @Table WHERE FIELD = N'آهسته'` it does not return any result. I want to find the exact match not the fields that contain the word. – Pedram Feb 23 '14 at 21:22

score 3 · Answer 2 · answered Feb 22 '14 at 21:32

3

Probably you have problem with Persian and Arabic versions of the 'ي' and 'ك' during search. These characters even look the same, have different Unicode numbers:

select NCHAR(1740),  -- Persian ى
       NCHAR(1610),  -- Arabic ي
       NCHAR(1705), -- Persian ك
       NCHAR(1603) -- Arabic ك

more info: http://www.dotnettips.info/post/90

answered Feb 22 '14 at 21:32

VahidN

18,457
8
73
117

It may also need to use `like` operator. – Afshar Mohebi Feb 23 '14 at 14:32
@afsharm: I`ve already tried `like` too. but it has still the problem. – Pedram Feb 23 '14 at 20:29
@VahidN: unfortunately it`s not for the case you`ve mentioned. for example, take a look at the picture at this address and you get what I`m saying: http://www.4shared.com/photo/LQd-2vOAba/1__1_.html – Pedram Feb 23 '14 at 20:32
@VahidN tnx for great scrip but it throw error if we have duplicate value :Cannot insert duplicate key row in object – AminM Oct 13 '20 at 06:45

Searching Persian characters and words in SQL server with different encoding

2 Answers2

Linked