-1

I want to make a function where I extract all words with length = 2 from a sentence. For example, if the sentence is "The Cat is brown", I want the result in the cell to be "is". If there are multiple words with length = 2, I want to keep these too. I have tried MID, RIGHT, LEFT, etc. These does not work as the position of the word is not always identical.

I have no clue how to do this in VBA, any suggestions are welcome :)

Thanks

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
emilk
  • 75
  • 5
  • For me easiest way would be to loop through all the characters in the cell. If there are multiple 2 letter words how are you wanting it to return? Eg `is, be, on` – Simon Jan 11 '21 at 08:52
  • Use find() to find spaces and work out the difference between them. Then mid() etc can be used. A good exercise in chopping bits of text. Hint: use find() to find the first space and then the second... – Solar Mike Jan 11 '21 at 08:55
  • To be more specific, they are country abbreviations in sentences which I want to extract out :-) I will match them towards a column with all country abbreviations hence the separator should not matter – emilk Jan 11 '21 at 08:55
  • Check out how to work with [Regular Expressions](https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops) in VBA and eg. use a pattern like the following example `(?>^| )(\S{2})(?> |$)` see https://regex101.com/r/PuhgSM/1/ – Pᴇʜ Jan 11 '21 at 08:58
  • 1
    Could you share a sample of the data and the expected result. Sharing your code may help us understand. Will you consider punctuation issues i.e. if not considered, `UK` may not be found in `Are you from the UK?`. Also, as Simon already asked, how will you return multiple results? In an array, multiple cells, delimited 'list' or...? – VBasic2008 Jan 11 '21 at 09:18
  • I cannot give the data, but I can give an example. Assume I have the following in a cell "Name of company, address, PA US". Then I want to extract PA US in the new function. Where the country abbreviation is located is arbitrary, hence I cannot use Right, Left, Mid etc.. – emilk Jan 11 '21 at 09:25
  • Would it be safe to say that the data of interest is the only data consisting of two upper-case letters? – VBasic2008 Jan 11 '21 at 09:27
  • 1
    We don't need the data perse, just a proper sample with expected result would be great. What is your expected result with: `This is a name, address, PA US.`. Also, would your data always be comma delimited and would the data of interest always be behind the last comma? And even more specific, would you have only two letter words behind the last comma? – JvdV Jan 11 '21 at 09:27
  • @emilk This could go wrong if the company name is just 2 characters like `O2` for example. – Pᴇʜ Jan 11 '21 at 09:41
  • Yep, but it will be matched towards a column of country abbreviations I already have, where O2 does not exist. It might go wrong in some cases, but I can fix that :) I will post what I have below. – emilk Jan 11 '21 at 10:08
  • You should instead update your current question to meet the standards of a question with an [mcve]. Clear sample data with clear expected results. – JvdV Jan 11 '21 at 10:13

2 Answers2

1

I have made you a UDF which should work for what you want. You use it like so:

=ExtractWords(Cell to check, Optional number of letters)

By default it will check for 2 letter words but you can specify as well as shown above.

Here is the code. Place it into a module

Function ExtractWords(Cell As Range, Optional NumOfLetters As Integer)

Dim r As String, i As Long, CurrentString As String, FullString As String, m As String

If NumOfLetters = 0 Then NumOfLetters = 2

r = Cell.Value

For i = 1 To Len(r)
    m = Mid(r, i, 1)
    If Asc(UCase(m)) >= 65 And Asc(UCase(m)) <= 90 Or m = "-" Or m = "'" Then 'Accepts hyphen or single quote as part of the word
        CurrentString = CurrentString & m
        If i = Len(r) Then GoTo CheckLastWord
    Else
CheckLastWord:
        If Len(CurrentString) = NumOfLetters Then
            If FullString = "" Then
                FullString = CurrentString
            Else
                FullString = FullString & " " & CurrentString 'Change space if want another delimiter
            End If
        End If
        CurrentString = ""
    End If
Next i

If FullString = "" Then
    ExtractWords = "N/A" 'If no words are found to contain the length required
Else
    ExtractWords = FullString
End If

End Function

There are probably other ways to do it that may be easier or more efficient. This is just something I came up with.

Simon
  • 1,384
  • 2
  • 10
  • 19
  • Just note I made this before all comments were made under your post. It shouldn't really change anything and should still work for what you're doing. – Simon Jan 11 '21 at 09:56
  • Hey thanks! I tried changing the delimiter to & " " & as there are no special characters involved and typed =ExtractWords(J8;2), but it gave me a blank answer. I can post what I have so far below. – emilk Jan 11 '21 at 10:11
  • Changing to just a space should work just fine. ie `FullString = FullString & " " & CurrentString`. Can you give the value of J8 so I can see? – Simon Jan 11 '21 at 10:14
  • Value of cell J8 is: "Company name#Head Office#City, Virginia US" – emilk Jan 11 '21 at 10:16
  • @emilk fixed the issue. – Simon Jan 11 '21 at 10:23
  • @emilk sorry which answer are you accepting as i saw you had accepted mine but is now not. Any particular reason if I may ask? – Simon Jan 14 '21 at 05:59
  • Oh sorry, can you only accept one at a time? Never really posted anything here. Since both solutions works in its own way, your code worked in my way and maybe the one above will work for somebody else, I just accepted both to indicate they work. – emilk Jan 15 '21 at 07:08
  • Ahh right ok yeah you can only accept one but you can upvote the others as well. – Simon Jan 15 '21 at 07:37
0

Double Upper Case Occurrences

  • In Excel you can e.g. use it like this:

    =getDUC(A1)
    =getDUC(A1," ")
    =getDUC(A1,",")
    =getDUC(A1,"-")
    

The Code

Option Explicit

' In Excel:
Function getDUC( _
    ByVal s As String, _
    Optional ByVal Delimiter As String = ", ") _
As String
    Dim arr As Variant
    arr = DoubleUCaseToArray(s)
    getDUC = Join(arr, Delimiter)
End Function

' In VBA:
Sub testDoubleUCaseToArray()
    Dim CCodes As Variant: CCodes = Array("US,UKUs", "UkUS,UK", "kUSUKsUK")
    Dim arr As Variant
    Dim n As Long
    For n = LBound(CCodes) To UBound(CCodes)
        arr = DoubleUCaseToArray(CCodes(n))
        Debug.Print Join(arr, ",")
    Next n
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose:      From a specified string, returns all unique double upper case
'               occurrences in a 1D (zero-based) array.
' Remarks:      From the string 'USUk' it returns only 'US' (not `SU`).
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function DoubleUCaseToArray( _
    ByVal s As String) _
As Variant
    If Len(s) > 1 Then
        With CreateObject("Scripting.Dictionary")
            Dim cFirst As String * 1
            Dim cSecond As String * 1
            Dim n As Long
            For n = 1 To Len(s) - 1
                cFirst = Mid(s, n, 1)
                If cFirst Like "[A-Z]" Then
                    cSecond = Mid(s, n + 1, 1)
                    If cSecond Like "[A-Z]" Then
                        .Item(cFirst & cSecond) = Empty
                    End If
                    n = n + 1
                End If
            Next n
            If .Count > 0 Then
                DoubleUCaseToArray = .Keys
            End If
        End With
    End If
End Function
VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • Hey, thanks! This do work sort of. However, every letter is capital letter in the sentence (sorry for not being clear). Hence, the code will print every pair of letter in the sentence rather than extracting the LEN(substring)=2 – emilk Jan 11 '21 at 10:25
  • Then you have little choice but to split the string by a `space` and check each element if it contains two characters. Something like Simon did. Before the `Split`, maybe use a `Replace` for punctuational or other characters that might not be 'covered' by the split. `Trim` might also come in handy. – VBasic2008 Jan 11 '21 at 10:37