0

I am trying to search for all occurrences of doubles in a long chunk of text. the text represents the description of multiple defects in a system. The doubles I am looking for are depths that are normally in the text multiple times as "n.nnnW X n.nnnL X n.nnnD". The n.nnn is normally 0.017D (example) but I want to account for 5.567D if that ever comes up.

The problem is that there are also occurrences of the terms within 1.5d of and also a .015dia. Case for the letters in these are also varied some are all caps and some are all lowercase. The text sometimes also has a space between the number and the "d" and sometimes spells out the word "deep" or "depth" like this: 0.017 deep.

I need the values to be extracted as doubles eventually so I can do math on them.

I have the following regexp pattern: [.](?:\d*\.)?\d+(\s?)[dD](?!ia|IA)

This pattern seems to find all the things I need and even eliminates the diameters that are spelled out as n.nnndia or n.nnnDIA. The thing the pattern DOES NOT catch is the within 1.5d of text.

After some light research I noted that in VBA the lookbehind code is NOT supported; and even so, I was never able to get the lookbehind pattern to work anyhow (using Regex101).

Here is my Access VBA code to illustrate how I am doing it. The r EXP(0) value is my pattern above.

Set rs2 = CurrentDb.OpenRecordset("SELECT * FROM [NCR_RawImport-TEXT] WHERE " & skinTermSQL, dbOpenSnapshot)
'rEXP(0) = "[.](?:\d*\.)?\d+(\s?)[dD](?!ia|IA)"
tVAL = (CDbl(Me.cmbGDepth.Value) / 1000)
Do While Not rs2.EOF
    found = False
    Set regEXP1 = CreateObject("VBScript.RegExp")
    regEXP1.IgnoreCase = True
    regEXP1.Global = True
    For i = 0 To rEXPIndx - 1
        regEXP1.Pattern = rEXP(i)
        Set Matches = regEXP1.Execute(rs2.NARR_TXT)
        For Each Match In Matches
            aVAL = CDbl(Trim(Replace(Replace(UCase(Match.Value), " D", ""), "D", ""))) 'convert matched value to a double.
            If (aVAL >= tVAL) Then
                found = True
            End If
        Next
        Set Matches = Nothing
    Next i
    Set regEXP1 = Nothing
    If (found) Then
        strSql = "UPDATE [NCR_FinalData] SET [NCR_FinalData].SRCH = [NCR_FinalData].SRCH & 'G' WHERE [NCR_FinalData].SRCH Not Like '*G*' AND [NCR_FinalData].NC_KEY = '" & rs2.NC_KEY & "';"
        Call writeLog("cmdUpdateNCRs: " & strSql)
        DoCmd.RunSQL strSql
    End If
    rs2.MoveNext
Loop
Set rs2 = Nothing
Erik A
  • 31,639
  • 12
  • 42
  • 67
DBMDUDE
  • 1
  • 1
  • 1
    Please try to describe more clearly what you're actually trying to match. Usually lookbehinds and lookaheads can just be avoided by using mulitple capturing groups and referring to one specific one. – Erik A Sep 18 '17 at 20:15
  • 3
    Does [Lookbehind on regex for VBA?](https://stackoverflow.com/questions/9150552/lookbehind-on-regex-for-vba?rq=1) (from the "Related" question list) answer your question? – Mathieu Guindon Sep 18 '17 at 20:21
  • Anything like this: 'n.nnnD', 'n.nnnd', 'n.nnn D', 'n.nnn d' but NOT 'n.nnnDIA' or 'n.nnndia' and NOT 'WITHIN n.nnnD' or 'within n.nnnd'. Also the number of digits after the decimal place vary too. I thought I showed that clearly in the example above. – DBMDUDE Sep 18 '17 at 20:28
  • @DBMDUDE that still isn't clear to me. Are those n's placeholders for numbers? Do you want to match the whole string or a specific part? Is within a specific lookbehind, and the only one you're not wanting to match. Sounds like that question referred to by MatsMug holds the answer, try that first. – Erik A Sep 18 '17 at 21:02
  • Do only the substrings you want to match have 3 decimals, and do they *always* have 3 decimals? Or can the substrings you *don't* want matched have 3+ decimal too? – Ansgar Wiechers Sep 19 '17 at 08:07
  • OK the data is free form text (read: no way to detect the formatting of any given value) where all the values are spattered into the text. YES the n's are place holders for numbers and I want my match set to return only the value SO if it encounters this text: "Qty (3) scratches near hole countersink that are .04w X .03l x .048d and it is within 1.5dia from fastener hole." From this it only returns 1 value of 0.048... note it DID NOT return the 1.5D because it is preceded by the word within. – DBMDUDE Sep 20 '17 at 16:40
  • Another sample: "Quan 5 gouges that are .05w by .02l by .053 deep" would return .053 only. NOW if both these defects showed up in the same text I want two values of .048 and .053 in my match array. – DBMDUDE Sep 20 '17 at 16:40
  • The defect text can have a BUNCH of defects worded in any way shape or form. – DBMDUDE Sep 20 '17 at 16:41

0 Answers0