0

I have a file with 200 rows which have values like this: ANTWERPEN 3 ABDIJ Abdijstraat 71-73 2020 9:00 18:00 9:00 18:00 9:00 18:00 9:00 18:00 9:00 19:00 9:00 19:00 which I want to have splitted into separate columns.

  1. I want to have 1 column for the part which is in Capitals entirely. In this specific case, that would be: ANTWERPEN 3 ABDIJ.

  2. And another column for the part that comes after it, until the 4 numeric characters. In this case: Abdijstraat 71-73

I am happy the row values have this distinction to separate the addresses, but I do not know how to do this.

I have had a similar situation for splitting cells at the first numeric character:

text to columns: split at the first number in the value

But now I am looking for a two-fold solution to have in the first column the first part which is entirely in capitals, which represents the city and in the 2nd column I need to have the string which starts with a capital but is then followed by non-capital characters and ends before a 4 characters string of numeric characters.

I would be happy if I could create a vba or excel code/formula which could do this for me, but unfortunately, I can not :-(

So I hope someone can.

edit:

finding some other routines and modifying and testing it, helped me to create this:

Sub doitall()
   Dim cell As Range, j As Integer, i As Integer, x As String
   Dim str As String
   Dim strlen As Integer
   Dim k As Integer
   Dim l As Integer
   Dim y As Integer
'   Dim v As Integer
'
'
'   For j = 1 To Cells(Rows.Count, 1).End(xlUp).Row
'      For Each cell In ActiveSheet.Range(Cells(1, 1), Cells(j, 1))
'         For i = 1 To Len(cell)
'         x = Mid(cell, i, 1)
'         If x = ":" Then Exit For
'      Next i
'      cell.Offset(0, 1) = Left(cell, i - 8)
'      Next cell
'   Next j

'geparkeerd
'            If l >= 65 And l <= 90 Then
'        If v > 1 Then
'        m = v - 1
'        l = Asc(Mid(Cells(j, 2), m, 1))
'        Else
'        l = 0
'        End If


For j = 1 To Cells(Rows.Count, 2).End(xlUp).Row
    For Each cell In ActiveSheet.Range(Cells(1, 2), Cells(j, 2))
        For v = 1 To Len(cell)
            k = Asc(Mid(cell, v, 1))
            If k >= 97 And k <= 122 Then
                If v < 1 Then
                Exit For
                Else: m = v - 1
                End If
                    l = Asc(Mid(cell, m, 1))
                    If l >= 65 And l <= 90 Then
                    y = Len(cell) - (v - 1)
                    cell.Offset(0, 1) = Mid(cell, m, y + 1)
                Else
                End If
            End If
        Next v
    Next cell
Next j

End Sub

the first part finds the ":" in the cell value and uses all characters on the left from ":" minus 8 as the cell value for the cell in the column next to it.

The second part has to use this 'new' value to separate the city name from the street name. Fortunately, the street name always starts with a capital and is followed by a non-capital. And fortunately, the city name is completely in capitals which makes it easier to split the value based on Capital followed by non capital.

I focus on the second part now.

what the second part does is check for each cell and each position in the cell if it is non-capital. If it is, it checks if the position before is capital. If it does, it have to use the all characters from the capital as a new value in the cell in the next column.

This works. But not for this value: BELLE- ILE "Belle-Ile" Shop 22 -Quai des Vennes 1 the result from that value is only Vennes 1.

but why? v loops from 1 to the length of the cell. But starts at 1 so position 1 is at the left of the cell value. From this routine, the result should actually be Belle-Ile" Shop 22 -Quai des Vennes 1.

Anyone have the explanation for this? I will adjust it by hand now, but I am just curious to find out why it returns this values.

Solution: v has to check from len(cell) to 1 step -1. After I changed that, It works almost perfectly. But I still do not understand why. How I read it, is that v starts testing at the last position works towards the first position of the cell value. Like this, in my opinion, the routine would not work I believe. But somehow it does. The key is understanding why v has to be len(cell) to 1 step -1 instead of 1 to len(cell).

I hope someone can explain this to me.

(I will also try the regex solution after I have got to learn something about it).

Community
  • 1
  • 1
DutchArjo
  • 319
  • 3
  • 8
  • 29

2 Answers2

0

I am new to regex, but the following works with the input line given above. No doubt a more elegant solution exists, but this might get you going in the right direction. StackOverflow links I found useful in building the regex patterns:

How to match "anything up until this sequence of characters" in a regular expression?

Regex to match mixed case words

Regex to match only uppercase "words" with some exceptions

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Function Part1(Myrange As Range) As String

    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String

    strPattern = ".+?(?=[A-Z][a-z]+)"

    If strPattern <> "" Then
        strInput = Myrange.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            Set matches = regEx.Execute(strInput)
            For Each Match In matches
                Part1 = Part1 & Match.Value
            Next
        Else
            Part1 = "Not matched"
        End If
    End If

End Function

Function Part2(Myrange As Range) As String

    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String

    strPattern = ".+?(?=[A-Z][a-z]+)"

    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            Part2 = regEx.Replace(strInput, strReplace)
            regEx.Pattern = ".+?(?=[0-9]{4})"
            Set matches = regEx.Execute(Part2)
            For Each Match In matches
                Part2 = Match.Value
            Next
        Else
            Part2 = "Not matched"
        End If
    End If

End Function
Community
  • 1
  • 1
Edward Leno
  • 6,257
  • 3
  • 33
  • 49
0

This is what I have and what satisfies my 'need':

Sub doitall()
   Dim cell As Range, j As Integer, i As Integer, x As String
   Dim str As String
   Dim strlen As Integer
   Dim k As Integer
   Dim l As Integer
   Dim y As Integer
   Dim v As Integer

   For j = 1 To Cells(Rows.Count, 1).End(xlUp).Row
      For Each cell In ActiveSheet.Range(Cells(1, 1), Cells(j, 1))
         For i = 1 To Len(cell)
         x = Mid(cell, i, 1)
         If x = ":" Then Exit For
      Next i
      cell.Offset(0, 1) = Left(cell, i - 8)
      Next cell
   Next j

For j = 1 To Cells(Rows.Count, 2).End(xlUp).Row
    For Each cell In ActiveSheet.Range(Cells(1, 2), Cells(j, 2))
        For v = Len(cell) To 1 Step -1
            k = Asc(Mid(cell, v, 1))
            If k >= 97 And k <= 122 Then
                If v < 1 Then
                Exit For
                Else: m = v - 1
                End If
                    l = Asc(Mid(cell, m, 1))
                    If l >= 65 And l <= 90 Then
                    y = Len(cell) - (v - 1)
                    cell.Offset(0, 1) = Mid(cell, m, y + 1)
                    cell.Offset(0, 2) = Left(cell, (m - 1))
                Else
                End If
            End If
        Next v
    Next cell
Next j

End Sub

It works almost perfectly. except for some cells that have some other characters in the string which are not covered by this routine. But I believe that could also be added (check op spaces, double quotes etc.)

DutchArjo
  • 319
  • 3
  • 8
  • 29