-2

I have a column given to me in a spreadsheet which looks like that: enter image description here

What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user: enter image description here

... and then get a full list of all references, as follows:

enter image description here

Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?

Darius
  • 489
  • 2
  • 6
  • 22
  • 3
    look into regex in vba. – Scott Craner Oct 25 '16 at 14:49
  • You could also possibly avoid a lot of VBA - try doing Text to Columns with a delimiter of `[`. Then run again with `]` delimiter. That should help parse out info. Or you could use a space delimiter, but then you'd have to sift through lots of one-word cells. Another possibility is to use `mid()` to grab text in between `[` and `]`, then just "text to columns" the result. – BruceWayne Oct 25 '16 at 14:58
  • 1
    Whether you do it manually or through VBA in this instance probably depends on how many rows of this kind of data you have to go through. Is the pain of manually digging going to be less if you're writing disposable code or is this being used for a lot of rows / used multiple times. – Tim Edwards Oct 25 '16 at 15:02

3 Answers3

1

There are many examples of number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).

Put the following into a standard module code sheet.

Option Explicit
Option Base 0    '<~~this is the default but I've included it because it has to be 0

Function numberParse(str As String, _
                     Optional ndx As Integer = 0, _
                     Optional delim As String = "; ") As Variant
    Dim n As Long, nums() As Variant
    Static rgx As Object, cmat As Object

    'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
    If rgx Is Nothing Then
        Set rgx = CreateObject("VBScript.RegExp")
    Else
        Set cmat = Nothing
    End If
    numberParse = vbNullString

    With rgx
        .Global = True
        .MultiLine = True
        .Pattern = "[0-9]{4}"
        If .Test(str) Then
            Set cmat = .Execute(str)
            If CBool(ndx) Then
                'pull the index of the array of matches
                numberParse = cmat.Item(ndx - 1)
            Else
                'resize the nums array to accept the matches
                ReDim nums(cmat.Count - 1)
                'populate the nums array with the matches
                For n = LBound(nums) To UBound(nums)
                    nums(n) = cmat.Item(n)
                Next n
                'convert the nums array to a delimited string
                numberParse = Join(nums, delim)
            End If
        End If
    End With
End Function

With your blurb in A2, put the following into B2,

=numberParse(A2)

With your blurb in A2, put the following into A4 and fill down,

=numberParse(A$2, ROW(1:1))

Your results should resemble the following,

numberParse


¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.

² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.

Community
  • 1
  • 1
1

assuming:

  • worksheet to process named after "pressure"

  • column "A" with cells to get references out of

  • column "B" to write corresponding extracted references in

  • column "C"to write full list of all references in

you could try this

Option Explicit

Sub main()
    Dim cell As Range
    Dim references As String
    Dim referencesArr As Variant

    With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
        For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
            references = references & GetReferences(cell) & "; "
        Next cell
        If references <> "" Then
            referencesArr = Split(Left(references, Len(references) - 2), ";")
            .Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
        End If
    End With
End Sub

Function GetReferences(rng As Range) As String
    Dim arr As Variant, iElem As Long
    Dim strng As String

    With rng
        arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
        For iElem = 1 To UBound(arr) - 1 Step 2
            strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
        Next iElem
    End With
    If strng <> "" Then
        GetReferences = Left(strng, Len(strng) - 2)
        rng.Offset(, 1) = GetReferences
    End If
End Function
user3598756
  • 28,893
  • 4
  • 18
  • 28
0

For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.

Then, you can run this sub (with tweaking most likely, to narrow down the ranges):

Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")

Dim i As Long
For i = LBound(myArray) To UBound(myArray)
    Cells(i + 1, colToUse).Value = myArray(i)
Next i

End Sub

Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

BruceWayne
  • 22,923
  • 15
  • 65
  • 110