1

Any idea how to Identify and extract noun and modifier using VBA (excel)

Example:

ball valve 2in for green pump with gasket

Should be: ball valve

Any help will be appreciated

ρяσѕρєя K
  • 132,198
  • 53
  • 198
  • 213
yair
  • 29
  • 1
  • 6

1 Answers1

1

There are some different approaches, depending on the type of sentence you expect. In your example, the two words you want to extract are on the beginning of the sentence, and separated by whitespaces. If you expect this to be always the case, then you could use something simple as

Function getNoun(ByVal sentence As String)
    getNoun = ""
    pos1 = InStr(1, sentence, " ") 'find the first whitespace
    If pos1 <= 0 Then
        getNoun = sentence 'if no whitespace, then assume there is only the noun
        Exit Function
    End If
    pos2 = InStr(pos1 + 1, sentence, " ") 'find the second whitespace
    If pos2 <= 0 Then
        getNoun = sentence 'if no second whitespace, then assume there is only the noun and qualifier
        Exit Function
    End If

    getNoun = Left(sentence, pos2 - 1) 'if there are two or more spaces, get all chars before the second one
End Function

Tests in immediate window:

? getNoun("ball valve 2in for green pump with gasket")
ball valve
? getNoun("ball valve")
ball valve
? getNoun("ball")
ball

If your scenario is more complex and you need to use specific criteria to determine which words are the desired noun and qualifier, you would probably find use for the Regex COM class (see this topic for example).

EDIT: Based on the comments, I understand that positions are variable, and that it is acceptable to use the MS Word thesaurus as a reference. If the code will run in Microsoft Word, the following function will tell you whether or not a word is a noun:

 Function is_noun(ByVal wrd As String)
  Dim s As Object, l As Variant
  is_noun = False
  Set s = SynonymInfo(wrd)
  Let l = s.PartOfSpeechList
  If s.MeaningCount <> 0 Then
      For i = LBound(l) To UBound(l)
          If l(i) = wdNoun Then
              is_noun = True
          End If
      Next
  End If
End Function

If you are not running on MS Word (your tags suggest MS Excel) but MS Word is installed in the target system, then you can adapt the above code to use MS Word COM automation object.

Then you can extract the first noun, and the next word - if any -, from a sentence, with something like this

Function getNoun(ByVal sentence As String)
   getNoun = ""
   Dim wrds() As String
   wrds = Split(sentence)
   For i = LBound(wrds) To UBound(wrds)
        If is_noun(wrds(i)) Then
            getNoun = wrds(i)
            If i < UBound(wrds) Then
                getNoun = getNoun & " " & wrds(i + 1)
            End If
            Exit Function
        End If
    Next
End Function

Notice, however, that with this you are trusting blindly in MS Word's word database and may get weird results if your sentences contain, for example, words that may be a verb or a noun depending on context. Also, the above example will use the default language of your setup of MS Word (it is possible to use a different one - if installed - by including a language parameter in SynonymInfo)

Community
  • 1
  • 1
dnep
  • 512
  • 2
  • 7
  • By the way, I could have checked only `pos2 <= 0` on the code above, but I thought you might want to treat differently the cases when you've got only the name and when you've got name and qualifier, in your own code, so I opted to keep it verbose. – dnep Feb 02 '16 at 11:48
  • Thank you Nice idea Actually I am looking for a general solution to Identify noun and modifier In my example If we change the order of the words the macro will not work correctly "2in ball valve for green pump with gasket" We get "2in ball" – yair Feb 02 '16 at 14:24
  • Yes, of course. As stated in my answer, if the position of the desired words in the sentence is not fixed, then you wil have to resort to more complex solutions. Actually to have a chance of building code to solve your problem, you have to solve it in your mind first - in this case you have first to decide how the program could determine if a word is the desired noun, or its qualifier: Is there a particular ordering rule for words? Are such nouns and/or qualifiers restricted to a known list? If there are more than one eligible noun, which one should the program pick? Then you can start coding. – dnep Feb 02 '16 at 14:38
  • My thought was to try to split my text and loop using Microsoft Word to check if it is noun - If yes then the next word is the qualifier Is it possible? – yair Feb 02 '16 at 15:08
  • Yes, I just suggested a solution along this line. Thanks for clarifying. – dnep Feb 02 '16 at 15:56
  • It would be greatly appreciated If you can add to the code the functionality of using MS Word automation object thank you – yair Feb 03 '16 at 05:35
  • No problem, There are plenty of topics about Word automation (e.g. http://stackoverflow.com/questions/16418292/vba-open-word-from-excel and http://stackoverflow.com/questions/32739715/open-word-template-paste-excel-data-in-and-save). Essentially you have to create a `Word.Application` object and then use the MS Word-specific functions (e.g. `SynonymInfo()`) as methods of this object. I already gave you the most difficult part of the solution, and I guess you can roll your own now. – dnep Feb 03 '16 at 11:57
  • If you get stuck, check [this](http://www.excelforum.com/excel-programming-vba-macros/678216-using-the-thesaurus-in-vba-print.html). Also keep in mind that it if you intend to run `is_noun()` many times it might be a good idea to create the `Word.Application` object once, as a shared object, do all the work you need with it, and then close it (otherwise your computer would keep opening and closing MS Word in the background each time you call the function). – dnep Feb 03 '16 at 11:59