0

I have a table in excel that has two columns, call them Tags and Note.

Tags can be empty cell or one tag, or many tags separated by a comma. (each tag is a string)

Note is just an arbitrary sentence (a note).

Note that a tag/tags do NOT reflect a specific note.

I would like to write a function that outputs a string that contains an ID (beginning with 0 for first row), the note and the tag/tags associated with that note.

My function must be able to work if I enter a new row/rows.

So far, I think i've managed to find a function that can read the end of my table. Now I need to be able to process Column A (Tags) to a specific format then concatenate my processed columns to a format similar to this:

{"id":0,"note":"contents of colB","tags":["tag1","tag2", etc]}

This is what I have :(

Sub Parse()

ActiveWorkbook.Sheets.Add.Name = "Result"

idindex = 0

For Each Line In Sheets("Template").Range("B2:B" & 
Sheets("Template").Range("B2").End(xlDown).Row)

I'm not familiar with VBA but if I can figure out how to process the columns and append the id, i think I can figure out the formatting.

Brandon
  • 11
  • 4

1 Answers1

1

Might want to see this for good info on how to find the "last" row in a range/column/sheet/etc. Also a good practice to use Option Explicit which forces you to declare all variables (helps prevent typos, and other hard-to-trace errors that arise from implicit typing, etc.).

This is pretty basic string manipulation/string building. You'll need to use some of the built in VBA functions like Join, Split, etc., and you'll need to escape your quotation marks, so something like this:

Sub Parse()
Dim thisCell As Range
Dim id As Long
Dim columnRange As Range
Dim note As String
Dim tags As String
Dim output As String
ActiveWorkbook.Sheets.Add.Name = "Result"

id = 0
With Sheets("Template")
    Set columnRange = .Range("B2:B" & .Range("B2").End(xlDown).Row)
    For Each thisCell In columnRange
        note = thisCell.Value
        tags = thisCell.Offset(0, -1).Value
        output = FormatOutput(id, note, tags)
        ' Remove the next line unless you want to print on the same worksheet in column C:
        thisCell.Offset(0, 1).Value = output
        ' This line prints to the Result sheet:
        Sheets("Result").Cells(id + 1, 1) = output
        id = id + 1
    Next
End With

End Sub

I made this fancy function to format the output based on the paramaters: id, thisNote (string) and theseTags (which we expect is the comma-delimited string of tags). I find it easier to build the parts like this when scripting, rather than trying to keep track of all my quotes and whether they're properly escaped/etc.:

Function FormatOutput(id As Long, thisNote As String, theseTags As String) As String
    Const OPEN_ITEM As String = "{""id"":"
    Const OPEN_NOTE As String = " ""note"":"
    Const OPEN_TAGS As String = " ""tags"": ["
    Const CLOSE_ITEM As String = "]}"
    Const DBLQUOTE As String = """"
    Const COMMA As String = ","

    FormatOutput = OPEN_ITEM & CStr(id) & _
                OPEN_NOTE & DBLQUOTE & thisNote & DBLQUOTE & COMMA & _
                OPEN_TAGS & _
                    IIF(Len(Trim(theseTags)) = 0, "", _
                    DBLQUOTE & Join(Split(theseTags, COMMA), DBLQUOTE & COMMA & DBLQUOTE) & DBLQUOTE) & _
                CLOSE_ITEM
End Function

And that gives me an output like:

{"id":0 "note":"this is a note", "tags": ["tag1","tag3","tag5"]}

It handles notes without tags and vice-versa:

enter image description here


That function (FormatOutput) is the brains behind this operation. It should (hopefully) be pretty straightforward, but this part is a little tricky if you're unfamiliar with the built-ins:

DBLQUOTE & Join(Split(theseTags, COMMA), DBLQUOTE & COMMA & DBLQUOTE) & DBLQUOTE 

This ensures that we wrap each of our tag substrings in quotation marks for the output. The DBLQUOTE at the beginning puts a " before the first item, and likewise at the end puts a " after the last item.

Then we (Split(theseTags, COMMA)) split the delimited string on the commas, and Join them back with a new delimiter "," between each.

We need to do all of this tomfoolery because we're building a string that contains quotation marks, which are otherwise treated as either the beginning or end of a string.

David Zemens
  • 53,033
  • 11
  • 81
  • 130
  • Thanks. This pretty much does what I asked. Your explanation helped me to follow through the code and adjust the formatting as needed. I have one issue though, for a cell with no tags, your code returns [""]. How do I get it to return [] (ie. without the double quotes. Seems as though a string always has double quotes. How can I get around this? – Brandon May 10 '19 at 05:26
  • @Brandon see revised `FormatOutput` function. We'll use the `IIF` function here, check whether the `theseTags` is an empty string and if so, append nothing to the output, otherwise we'll build the bracketed list. – David Zemens May 10 '19 at 12:52
  • Hi David, i noticed when there are more than one tags, there is a white space within the double quotes before the tag. Example; tags:["tag1"," tag2"] How can i remove this whitespace? I tried applying Trim, but I get all sorts of errors, one of which asks me to add a new argument to FormatOuput – Brandon May 14 '19 at 18:26
  • remove the whitespace from your input cells? you apparently don't have a *comma-delimited* list of tags, you have a *comma+space delimited* list. Or, if you always have `", "` between the tags, just update the formatting function: `Const COMMA As String = ", "` that way it will split on the comma+space, instead of just the comma. – David Zemens May 14 '19 at 19:22
  • It's also considered good form around here to upvote or accept answers that solve your problem/question. :) – David Zemens May 14 '19 at 19:23
  • Thanks! I did upvote but i don't have enough rep for that but I have accepted your answer! – Brandon May 14 '19 at 19:49