2

I'm currently doing some work with a very large data source on city addresses where the data looks something like this.

137 is the correct address but it belongs in a building that takes up 135-138A on the street.

source:

137 9/F 135-138A KING STREET 135-138A KING STREET TOR

i've used a function which removes the duplicates shown on extendoffice.

the second column has become this:

137 9/F 135-138A KING STREET TOR

what I want to do now is

  • find address number and add it in front of the street name
  • remove the numbers that are connected to the dash - ):

9/F 137 KING STREET TOR

Would the the best way to accomplish this?

The main problem I'm having with this is there are many inconsistent spaces in address names ex. "van dyke rd".

Is there anyway I can locate in an array the "-" and set variables for the 2 numbers on either side of the dash and replace it with the correct address number located at the front

Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
Dim x

With CreateObject("Scripting.Dictionary")
    .CompareMode = vbTextCompare
    For Each x In Split(txt, delim)
        If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
    Next
    If .Count > 0 Then RemoveDupes2 = Join(.keys, delim)
End With
End Function

Thanks

brettdj
  • 54,857
  • 16
  • 114
  • 177

3 Answers3

1

Regular Expressions are a way to (amongst other things) search for a feature in a string.

It looks like the feature you are looking for is: number:maybe some spaces : dash : maybe some spaces : number

In regex notation this would be expressed as: ([0-9]*)[ ]*-[ ]*([0-9]*) Which translates to: Find a sequential group of digits followed by zero or more spaces, then a dash, then zero or more spaces, then some more digits.

The parenthesis indicate the elements that will be returned. So you could assign variables to the be the first number or the second number.

You might need to tweak this if a dash can potentially occur elsewhere in the address.

Further information on actually implementing that is available here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Community
  • 1
  • 1
Jemma
  • 112
  • 6
1

This meets the case you want, it captures the address range as two separate matches (if you want to process further).

The current code simple removes this range altogether.

What logic is there to move the 9/F to front?

See regex here

Function StripString(strIn As String) As String
 Dim objRegex As Object
 Set objRegex = CreateObject("vbscript.regexp")
 With objRegex
       .Pattern = "(\d+[A-C]?)-(\d+[A-C]?)"
        If .test(strIn) Then
            StripString = .Replace(strIn, vbullstring)
        Else
           StripString = "No match"
        End If
    End With
End Function
brettdj
  • 54,857
  • 16
  • 114
  • 177
  • Hey thanks for the help, ive been able to combine it with my other functions to get the 9/F to the front. Just one question what is vbullstring? – bathtubandatoaster Feb 09 '17 at 08:13
  • @bathtubandatoaster it replaces the matched regexp with "". It is is built in constant so is more efficient than using "" – brettdj Feb 09 '17 at 13:49
0

I'd just:

  • swap 1st and 2nd substrings

  • erase the substring with "-" in it

    Function RemoveDupes2(txt As String, Optional delim As String = " ") As String
        Dim x As Variant, arr As Variant, temp As Variant
        Dim iArr As Long
    
        With CreateObject("Scripting.Dictionary")
            .CompareMode = vbTextCompare
            For Each x In Split(txt, delim)
                If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
            Next
            If .count > 0 Then
                arr = .keys
                temp = arr(0)
                arr(0) = arr(1)
                arr(1) = temp
                For iArr = LBound(arr) To UBound(arr)
                    If InStr(arr(iArr), "-") <> 0 Then arr(iArr) = ""
                Next
                RemoveDupes2 = Join(arr, delim)
            End If
        End With
    End Function
    
user3598756
  • 28,893
  • 4
  • 18
  • 28