4

I have an Arabic text and English text in my DB. I am trying to search from my DB based on the text user enters. It can be English or Arabic.

I am using SQL and Entity Framework and below is the code portion in which I am trying to fetch the results matching the text.

  results = results.Where(c => c.ResourceValue.Contains(AnyText));

When it's English, I am getting proper results but when it's Arabic, it returns results with "???s" in DB. Those results are garbage.

I assume the problem is that the Arabic text is being compared as "???s" and Linq to Entity returns me results with '???s'.

How can I handle it?

Update:

I have used the SQL profiler to see what query is run against the DB. As suspected, Arabic input [Search text] is not going thorough but it's gone as '???s'

exec sp_executesql N'SELECT  
[Extent1].[ResourceValue] AS [ResourceValue]  
FROM (SELECT      
[Table].[ResourceValue] AS [ResourceValue]   
FROM [dbo].[Table] AS [VpStringResource])
AS [Extent1]  WHERE ([Extent1].[ChannelID] IN (@p__linq__0,0)) AND (( CAST(LEN([Extent1].
[ResourceValue]) AS int)) <> 0) AND ([Extent1].[ResourceValue] LIKE @p__linq__1 ESCAPE 
''~'')',N'@p__linq__0 smallint,@p__linq__1 varchar(8000)',@p__linq__0=2,
@p__linq__1='%??????%'

To Reproduce the issue,

As I mentioned, you can use any sql table. Add a column of type nVarchar(max). Add some Arabic text to the table. Use Entity Framework in C# and try to use the line in the question. Like

Context.Table.Where(c => c.ResourceValue.contains("اختبار")) 
Sanish Joseph
  • 2,140
  • 3
  • 15
  • 28
  • 3
    Here's a heuristic that's crude but remarkably effective: Write up your problems, hit Ctrl+F, type "assume", hit return. That's usually where you went wrong. – 15ee8f99-57ff-4f92-890c-b56153 Nov 02 '16 at 14:06
  • 2
    It may be your database doesn't support Arabic text. You'll want to check the collation on the column holding the text. See [What is the difference between NVarchar, Bin collation, Bin2 collation?](http://stackoverflow.com/q/35583412/215552), for SQL Server, for instance. – Heretic Monkey Nov 02 '16 at 14:07
  • DB is totally fine. Data is stored in Arabic and in the "results" variable when try to see the results, I can see Arabic text also in the list [ResourceValue]. But Contains/Where function not returning expected result. – Sanish Joseph Nov 02 '16 at 14:10
  • 1
    You're saying it does return results but they show as ??? - so the comparison itself it working? If so, are the results showing ??? in the debugger? – KMoussa Nov 02 '16 at 14:16
  • Let me try to be more clear. There are some garbage already present in the DB, in place for some Arabic text. In debugger, before this particular filter is applied, I can see all Arabic, English and garbage text with "???s". After this filter results are shown only with garbage text. Meaning Linq converted my input text 'AnyText' into ?? and compared, returning results only with '???s' – Sanish Joseph Nov 02 '16 at 14:21
  • I doubt your explanation is correct, since the underlying representation of the string wouldn't be converted to ???, I think this is only how it's shown on screen if there is no corresponding character to show .. can you test with a hardcoded value for `AnyText` to rule out any conversion happening to the variable – KMoussa Nov 02 '16 at 14:32
  • Please provide more information on what database you use, the database scheme of the table and how you have setup LINQ to SQL – NineBerry Nov 02 '16 at 14:33
  • I could see Arabic text properly going to that variable "AnyText". I was even able to update "AnyText" with few Arabic text copied from the ResultView just before applying this condition. – Sanish Joseph Nov 02 '16 at 14:36
  • This is MS SQL DB. The column with Arabic text is of type nvarchar(max). I am using Entity framework and using Db first model. – Sanish Joseph Nov 02 '16 at 14:38
  • Have you ruled out the possibility that the text is fine and it's being displayed in a font that doesn't have Arabic characters? – 15ee8f99-57ff-4f92-890c-b56153 Nov 02 '16 at 15:03
  • While debugging, it's clearly showing in Arabic. – Sanish Joseph Nov 02 '16 at 16:04
  • Adding some info to the question which I got using SQL profiler. – Sanish Joseph Nov 02 '16 at 16:28
  • Please post the string that you are searching for, and the SQL script used to generate the table (see [this image](http://1.bp.blogspot.com/-_d3-gEgmg2w/VquywdxlrSI/AAAAAAAAGgs/7LdsS5bAtx0/s1600/Untitled.png) for how to get that in SSMS), along with some example data. Basically, provide a [mcve] so that others can reproduce your situation. – Heretic Monkey Nov 02 '16 at 16:48
  • As I mentioned, you can use any sql table. Add a column of type nVarchar(max). Add some Arabic text to the table. Use Entity Framework in C# and try to use the line in the question. Like Context.Table.Where(c => c.ResourceValue.contains("اختبار")) – Sanish Joseph Nov 02 '16 at 16:53
  • Do you have unicode:true in your model builder? – Wiktor Zychla Nov 02 '16 at 17:08
  • Thank you so much for asking. I have a solution now. All comments were very helpful. – Sanish Joseph Nov 02 '16 at 17:14

1 Answers1

3

After a good research, I found solution for my own problem.

results = results.Where(c => c.ResourceValue.Contains(EntityFunctions.AsUnicode(AnyText)));

I had to import using System.Data.Entity.Core.Objects; also.

Hope this will be helpful for someone. Thanks for all the comments and hints to search for the solution.

Sanish Joseph
  • 2,140
  • 3
  • 15
  • 28
  • 1
    In other words, you had ASCII text that you tried to match against a Unicode string. You could have specified a collation, but you used a cast instead. That's not a problem of EF or LINQ. – Panagiotis Kanavos Nov 02 '16 at 17:14
  • The real bug is that the Arabic text was stored as `VARCHAR` in the first place. If the column type was `nvarchar` there would be no problem – Panagiotis Kanavos Nov 02 '16 at 17:17
  • The `???` characters mean that the database tried to read ASCII text using the wrong code page. You still haven't fixed the real problem, and you may also have bad data, ie Arabic text stored with the wrong collation ,eg Latin instead of Arabic. – Panagiotis Kanavos Nov 02 '16 at 17:19
  • I had mentioned, the table column was nvarchar(max). – Sanish Joseph Nov 02 '16 at 17:20
  • Not according to the profiler. Also not according to the `???` characters - these are substitution characters used when Windows tries to read ASCII data with the wrong page. They replace any character that doesn't exist in the target codepage. Post the schema. Right now, you only have problems reading the data. If you try to save to these rows, you may actually replace Arabic text with `?` and lose data – Panagiotis Kanavos Nov 02 '16 at 17:22
  • 1
    Which SQL Server version are you using? Full UTF-16 support was added in SQL Server 2012. Before that SQL Server supported the UCS-2 subset – Panagiotis Kanavos Nov 02 '16 at 17:27
  • Hope you are looking for this, [Value is the one]. DB is SQL 2014. By the way, update and insert are just working fine. GO CREATE TABLE [dbo].[**ResourceTransla****]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [ResourceID] [bigint] NOT NULL, [CultureCode] [varchar](5) NOT NULL, [Value] [nvarchar](max) NOT NULL, [CreateDate] [datetime] NULL, CONSTRAINT [PK_VpStringResource_1] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] – Sanish Joseph Nov 02 '16 at 17:39
  • varchar(8000) in profiler query is not the column type but type of the parameter that Linq query made. – Sanish Joseph Nov 02 '16 at 18:02