1

I know the code which can restrict a cell value to certain lengths as follows, but how do I restrict it to certain bytes for example let's say 240 bytes as well as truncate it within the limit? Also, the Charset would be "shift-jis" i.e., Japanese + English.

If Not Intersect(Target, Range("A2:B200")) Is Nothing Then
    For Each cell In Intersect(Target, Columns("A:B"))
        If (cell.Value) > 20 Then
            cell.Value = VBA.Left(cell.Value, 20)
            cell.Select
            MsgBox "Character limit for the cell is 20." & vbNewLine & "Truncated to 20 characters."
        End If
    Next cell
End If

Help is appreciated!

Community
  • 1
  • 1
SumeetN
  • 128
  • 1
  • 9
  • You will have to determine whether a DBCS language is being used. Once that has been determined, one of the worksheet [LEN, LENB functions](https://support.office.com/en-us/article/LEN-LENB-functions-29236f94-cedc-429d-affd-b5e33d2c67cb) can correctly determine the length. –  Mar 26 '16 at 11:21
  • @Jeeped - I'm not sure this works. LENB will return 2 bytes per character even for characters in the ANSI range if the default is a DBCS language. – Comintern Mar 26 '16 at 14:21
  • So you want a count of the length of non-DBCS characters and the count of the length of DBCS characters. Maybe AscW will help. –  Mar 26 '16 at 14:25
  • 2
    What is the goal of limiting the byte count? VBA's internal representation of strings is Unicode - do you just need to check to see if a string uses any characters outside of the ANSI range? – Comintern Mar 26 '16 at 14:26
  • Do you want to limit the length of the string represented in [UTF-8](https://en.wikipedia.org/wiki/UTF-8)? – omegastripes Mar 26 '16 at 15:05
  • Hi, I want Charset "shift-jis" limited to 240 bytes. The text would be in Japanese and English language. The combined size shouldn't exceed 240 bytes in a cell. – SumeetN Mar 28 '16 at 06:13

2 Answers2

1

If I understand the question correctly, what you ultimately need to do is determine if a character falls into the ANSI character range (can be represented by one byte - 0 to 255). Excel doesn't make this easy as noted in the comments. Neither does VBA, which represents all strings internally as UTF-16. This is compounded by the problem that the behaviour of the VBA Len and LenB changed after VB4. Prior to that change, they would have returned different results for Unicode or ANSI input. Now, they will both return the same result because LenB returns the in-memory length of the string which is always 2 bytes per character. What distinguishes the ANSI range is that the 2nd byte will always be zero.

The StrConv function does provide a way to check whether a string contains non-ANSI characters - you can convert to a Byte array and check to see if any of the high bytes are set. For example, the string "ABCD" is stored in VBA's memory as:

65  0   66  0   67  0   68  0

You can use a quirk of VBA's "Unicode conversion" to expand these to 2 bytes again with StrConv("ABCD", vbUnicode), which results in this:

65  0   0   0   66  0   0   0   67  0   0   0   68  0   0   0

For comparison, if you picked up the string "ΑΒΓΔ" from somewhere (which you would have to because there's no way to type it in the IDE), it could result in this depending on the encoding:

24  32  3   0   25  32  3   0   28  32  3   0   29  32  3   0

So, once you have a byte array, all you need to do is check every other byte - if you find a non-zero value, it can't be narrowed to ANSI:

Private Function IsANSI(test As String) As Boolean
    Dim bytes() As Byte, i As Long
    bytes = StrConv(test, vbUnicode)
    For i = 1 To UBound(bytes) Step 2
        If bytes(i) <> 0 Then
            IsANSI = False
            Exit Function
        End If
    Next i
    IsANSI = True
End Function

If all you care about is UTF-16 v ANSI, you can grab the "byte length" easily after you determine it can be narrowed to 8 bits:

Private Function ByteLength(test As String) As Long
    If IsANSI(test) Then
        ByteLength = Len(test)
    Else
        ByteLength = LenB(test)
    End If
End Function

Note that byte length always depends on the encoding. If you need the length of the string in a specific encoding, VBA isn't going to help you much natively unless it's a fixed width encoding (i.e. UTF-32, which VBA likely already butchered), you'll have to reach into the Windows API and explicitly convert it with WideCharToMultiByte and see what you get back. You can find a VBA example here.

Community
  • 1
  • 1
Comintern
  • 21,855
  • 5
  • 33
  • 80
0

You can use ADODB.Stream ActiveX to get a string of limited length of bytes represented in certain charset. Since you didn't specified any charset, I chose UTF-8 for an example. I modified your code and added LeftUTF8() function, the below code placed in Worksheet module:

Private Sub Worksheet_Change(ByVal Target As Range)

    Application.EnableEvents = False
    If Not Intersect(Target, Range("A2:B200")) Is Nothing Then
        For Each cell In Intersect(Target, Columns("A:B"))
            cell.Value = LeftUTF8(cell.Value, 240)
        Next cell
    End If
    Application.EnableEvents = True

End Sub

Private Function LeftUTF8(strCont As String, lngLenght As Long) As String

    Dim i As Long
    Dim arrCont() As Byte
    ' add reference to Microsoft ActiveX Data Objects Library (2.5 or later)
    Static objStream As New ADODB.Stream

    i = lngLenght
    With objStream
        .Type = adTypeText
        .Open
        .Charset = "utf-8"
        .WriteText strCont ' convert string to UTF-8 with BOM
        .Position = 0
        .Type = adTypeBinary
        If .Size > i + 3 Then ' size in bytes greater then limit + 3 bytes UTF-8 BOM
            For i = i To 1 Step -1 ' if last multibyte char is split then reduce output
                .Position = i + 3 ' next byte after the last + BOM
                If AscB(.Read(1)) And 192 <> 128 Then Exit For ' next byte is first byte of next char
            Next
        End If
        .Position = 0
        arrCont = .Read(i + 3) ' read bytes with BOM corresponding the limit
        .Close ' clear stream
        .Open
        .Type = adTypeBinary
        .Write arrCont ' write bytes
        .Position = 0
        .Type = adTypeText
        LeftUTF8 = .ReadText ' read string
        .Close
    End With

End Function

UPDATE

Here is the code to get a string of 240 bytes represented in shift-jis charset:

Private Sub Worksheet_Change(ByVal Target As Range)

    Application.EnableEvents = False
    If Not Intersect(Target, Range("A2:B200")) Is Nothing Then
        For Each cell In Intersect(Target, Columns("A:B"))
            cell.Value = LeftShiftJis(cell.Value, 240)
        Next cell
    End If
    Application.EnableEvents = True

End Sub

Private Function LeftShiftJis(strCont As String, lngLenght As Long) As String

    Dim i As Long
    Dim arrCont() As Byte
    ' add reference to Microsoft ActiveX Data Objects Library (2.5 or later)
    Static objStream As New ADODB.Stream

    i = lngLenght
    With CreateObject("ADODB.Stream")
        .Type = adTypeText
        .Open
        .Charset = "shift-jis"
        .WriteText strCont ' convert string to shift-jis binary representation
        .Position = 0
        .Type = adTypeBinary
        arrCont = .Read(i) ' read limited number of bytes
        .Close ' clear stream
        .Open
        .Type = adTypeBinary
        .Write arrCont  ' write all content, if second byte of last two-byte char is cut off, then zero will be added instead of the missing byte
        .Position = 0
        .Type = adTypeText
        LeftShiftJis = .ReadText ' if last char is two-byte and has second byte cut off, then it is trimmed
        .Close
    End With

End Function
omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • Thanks for the solution but I want Charset "shift-jis" limited to 240 bytes. The text would be in Japanese and English language. The combined size shouldn't exceed 240 bytes in a cell. How do I use above code for mentioned purpose? Please help. – SumeetN Mar 28 '16 at 06:22
  • @SUMEETNALE Check my answer, I've added code for shift-jis charset. – omegastripes Mar 29 '16 at 17:57