0

In C#, I am inserting data into a SQL table. The SQL column is of type varchar(4000), so in C# I need to truncate the string if it is too long, so that it can be stored in SQL. To perform the truncation, I need to know the weight of the string when it is added to SQL.

So far I have tried to use the length of the C# string with myString.Lentgh * sizeof(char) but this solution does not give the right answer, because the encoding may not be the same, so the weight of a character in C# is not the same as in SQL.

I also tried to do a SQL query SELECT LEN(myString) but this involves doing several queries to get the right truncation, so this solution is very slow.

So is there any way to calculate it directly in C#?

gunr2171
  • 16,104
  • 25
  • 61
  • 88
AoooR
  • 370
  • 1
  • 9
  • [This answer](https://stackoverflow.com/a/21259797/1043380) seems to suggest that `varchar(4000)` would accept 4000 _characters_ (and seeing that you're not allowing unicode that make it easier). Also, I would have thought that SQL Server would truncate the data for you when you insert the row. – gunr2171 Mar 10 '23 at 14:37
  • 2
    if you know what encoding you're using in that table then you can use `Encoding.GetByteCount()` to know the length in bytes BUT to truncate the string isn't simply to truncate it at that length (think about Unicode sequences and combining characters). Note that, as gunr said, SQL Server does this truncation for you – Adriano Repetti Mar 10 '23 at 14:38
  • @gunr2171 According to the Microsoft [documentation](https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver16#varchar---n--max--), in `varchar(n)`, `n` is the size in bytes, not the number of characters. SQL Server throws an error when the string is too long. Maybe there is an option to truncate directly in SQL, I will look for that. – AoooR Mar 10 '23 at 14:45
  • @AoooR yes, that's why `Encoding.GetByteCount()` (which returns the number of bytes needed to encode a specified string). Aboout SQL Server: check `ANSI_WARNINGS` – Adriano Repetti Mar 10 '23 at 15:10
  • For most collations `varchar` is just as simple as it accepts 256 possible characters and each character takes one byte. The exceptions are "double byte" collations and UTF8 collations. Are you working against a column with a known specific collation? – Martin Smith Mar 10 '23 at 18:09
  • 1
    By the way are you sure that it is actually `varchar(4000)` and not `nvarchar(4000)`? `nvarchar(4000)` is quite common as that is the largest non `max` value possible for `nvarchar`. With `varchar` the limit is `8000` – Martin Smith Mar 10 '23 at 18:24

1 Answers1

0

In my DB, string are UTF8. So in my C# code I used if (Encoding.UTF8.GetByteCount(myString) > 4000) {...}.

Thanks @Adriano Repetti for you awnser in comments.

AoooR
  • 370
  • 1
  • 9