-3

I am trying to extract specific text from 1 column which seems to have concatenated several data points. Here is an example of part of the output that appears in 1 row:

[{"q":{"as":[{"id":"1","tags":[{"tagid":"62","tagstr":"Example1"},{"tagid":"3","tagstr":"Example1"},{"tagid":"65","tagstr":"Example1"},{"tagid":"71","tagstr":"Example1"}],"text":"Example1"}],"hidden":"false","id":"1","questionalias":"1","text":"Example1","ttl":"Example1"}},

The text in bold is what I am trying to extract. In practice each 'Example1' is selected from an option of words. Therefore I know exactly what text I am looking for. What I am struggling with is creating a way for the output to strip out the unwanted text and return the key words (around 8)

Alternatively, if someone has done something similar in VBA, that could also be an option.

Has anyone faced this before?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
AST
  • 11
  • 1
  • 9
  • What is your expected output? – Pham X. Bach Mar 09 '17 at 14:47
  • Something along the following: Example1 Example1 Example1 Example. I can use 'text to columns'to then seperate the words. – AST Mar 09 '17 at 14:49
  • This is JSON -- you would be best served with a JSON parser. – Hogan Mar 09 '17 at 15:05
  • One idea using VBA is 2 steps: 1. Get n - number of keywords occur in your data = (LEN(data) - LEN(REPLACE(data, keyword, "") ) ) / LEN(keyword). And 2. make a loop to concatenate your keyword n times – Pham X. Bach Mar 09 '17 at 15:11
  • Thanks for the response so far. Given that this is JSON and being imported into excel, is there a 'JSON parser' equivalent? – AST Mar 09 '17 at 15:19

1 Answers1

0

You can parse you data with Regex! It's marvelous!

There're plenty (Left, Mid, Right, Instr) functions to parse your data, right?

Some people, when confronted with a problem, think “I know, I'll use regular expressions.”
Now they have two problems.

I think, that you trying to bound to some keywords(tagstr, text and ttl), so take a look at this.

Feel free to modify this expression, take a look on this and that!

In VBA there's no regex from scratch , so add VBA reference to "Microsoft VBScript Regular Expressions 5.5"

references

This is my exapmle with your data:

Sub test()
    Dim Data As String
    Dim Re As RegExp
    Dim ReMatch As MatchCollection
    Dim CurrentMatch As Match


    Data = "[{" & Chr(34) & "q" & Chr(34) & ":{" & Chr(34) & "as" & Chr(34) & ":[{" & Chr(34) & "id" & Chr(34) & ":" & Chr(34) & "1" & Chr(34) & "," & Chr(34) & "tags" & Chr(34) & _
            ":[{" & Chr(34) & "tagid" & Chr(34) & ":" & Chr(34) & "62" & Chr(34) & "," & Chr(34) & "tagstr" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & "},{" & Chr(34) & "tagid" & Chr(34) & ":" & Chr(34) & "3" & Chr(34) & _
            "," & Chr(34) & "tagstr" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & "},{" & Chr(34) & "tagid" & Chr(34) & ":" & Chr(34) & "65" & Chr(34) & "," & Chr(34) & "tagstr" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & _
            "},{" & Chr(34) & "tagid" & Chr(34) & ":" & Chr(34) & "71" & Chr(34) & "," & Chr(34) & "tagstr" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & "}]," & Chr(34) & "text" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & _
            "}]," & Chr(34) & "hidden" & Chr(34) & ":" & Chr(34) & "false" & Chr(34) & "," & Chr(34) & "id" & Chr(34) & ":" & Chr(34) & "1" & Chr(34) & "," & Chr(34) & "questionalias" & Chr(34) & ":" & Chr(34) & "1" & Chr(34) & _
            "," & Chr(34) & "text" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & "," & Chr(34) & "ttl" & Chr(34) & ":" & Chr(34) & "Example1" & Chr(34) & "}},"

    Debug.Print "My data is:" & vbNewLine & Data

    Set Re = New RegExp
    Re.IgnoreCase = True
    Re.Global = True
    Re.MultiLine = True
    Re.Pattern = "(?=" & Chr(34) & "tagstr" & Chr(34) & "|" & Chr(34) & _
            "text" & Chr(34) & "|" & Chr(34) & "ttl" & Chr(34) & ")(?:" & Chr(34) & _
            "\w*" & Chr(34) & ":" & Chr(34) & "(.*?)" & Chr(34) & ")"

    Debug.Print "My pattern is:" & vbNewLine & Re.Pattern

    Set ReMatch = Re.Execute(Data)

    Debug.Print "Matched " & ReMatch.Count & " times!"

    For Each CurrentMatch In ReMatch
        Debug.Print "Capture " & CurrentMatch.SubMatches(0) & " in " & CurrentMatch.Value
    Next

End Sub

Output:

output

Not so complicated, right? You can do this with standart string functions after all..

Community
  • 1
  • 1
CommonSense
  • 4,232
  • 2
  • 14
  • 38
  • Thanks appreciate this. Going through it shortly! – AST Mar 10 '17 at 09:14
  • @Anthony, good luck! My answer is partly a joke, because regular expressions are not so easy to understand and you will spend some time learning them. Feel free to play with your data on sites like `regex101` to see what you get with this and that pattern. And if you stuck - there's entire `regex` tag on SO. Just provide your problem, data, desired output and what are you tryied so far, like pattern and used language (vbscript in your case). – CommonSense Mar 10 '17 at 09:39
  • @Anthony S, anyway i'm curious that solution is works for you or not – CommonSense Mar 10 '17 at 12:45
  • unfortunately it didnt work. I'll be reading through the above in more detail. – AST Mar 10 '17 at 16:16