0

My workplace is changing CMS systems and we have around 5,000 products to import. The problem comes with image URL formatting as the two systems are laid out vastly different. I need a function or VB code to convert one cell:

Main|1|Vaterra/VTR03014C-1.jpg;VTR03014C|2|Vaterra/VTR03014C-2.jpg;VTR03014C|3|Vaterra/VTR03014C-3.jpg;VTR03014C|4|Vaterra/VTR03014C-4.jpg;VTR03014C|5|Vaterra/VTR03014C-5.jpg;VTR03014C|6|Vaterra/VTR03014C-6.jpg;VTR03014C|7|Vaterra/VTR03014C-7.jpg;VTR03014C|8|Vaterra/VTR03014C-8.jpg;VTR03014C|9|Vaterra/VTR03014C-9.jpg;VTR03014C|10|Vaterra/VTR03014C-10.jpg;VTR03014C|11|Vaterra/VTR03014C-11.jpg;VTR03014C|12|Vaterra/VTR03014C-12.jpg;VTR03014C|13|Vaterra/VTR03014C-13.jpg;VTR03014C|14|Vaterra/VTR03014C-14.jpg

into two cells containing:

Vaterra/VTR03014C-1.jpg

and this is where it gets tricky:

Vaterra/VTR03014C-2.jpg;Vaterra/VTR03014C-3.jpg;Vaterra/VTR03014C-4.jpg;Vaterra/VTR03014C-5.jpg;Vaterra/VTR03014C-6.jpg;Vaterra/VTR03014C-7.jpg;Vaterra/VTR03014C-8.jpg;Vaterra/VTR03014C-9.jpg;Vaterra/VTR03014C-10.jpg;|Vaterra/VTR03014C-11.jpg;Vaterra/VTR03014C-12.jpg;Vaterra/VTR03014C-13.jpg;Vaterra/VTR03014C-14.jpg

Notice how the "Main|1|" has been removed also, the tricky part is that not all of these begin with or contain "Main|1|" and not all of the options begin with or contain "Vaterra".

The main steps would be to remove each image's suffixes and then capture the line of text up to ".jpg" and move it to a separate cell.

Community
  • 1
  • 1
ConduciveMammal
  • 455
  • 7
  • 20
  • Would text to columns work here? If not, can you give a few more examples? It seems to be delimited with ;'s... – seadoggie01 Sep 24 '15 at 15:47
  • So do you want to split it into two cells or multiple cells? – ManishChristian Sep 24 '15 at 15:50
  • I don't imagine there are more examples to give. All of them are laid out of the same way, the only difference is some of the prefixes are "Main|1|" and others are made up of the product's SKU number. And that's right, yes, each is comma delimited with semicolons. – ConduciveMammal Sep 24 '15 at 15:50
  • @Nelly27281. Okay, so I first want to remove those prefixes I mentioned "Main|1|" completely. After that, I want the first image in a cell of it's own and the remaining image options in another cell altogether. Does that help clarify? – ConduciveMammal Sep 24 '15 at 15:52
  • 1
    Are these files saved locally? If so it would be easier to run a directory search and pull the filename attribute instead. – SierraOscar Sep 24 '15 at 15:53
  • @MacroMan Yes, for the time being, they're saved locally until I upload them. Do you mean run a search for the one image I want seperated? – ConduciveMammal Sep 24 '15 at 15:55

3 Answers3

0

Splitted and Junk RemovedBefore Regexp

After Regexp Replace All

With some tweeks, you will be able to make it happen without VBA.

First, replace | and / with ; so that you can have a consistent delimiter.

Also, you can remove Main|1| by replacing it with empty space.

Now, choose Data => Text to Columns Choose the option Delimeted you can now use the delimeter semicolon and you will have data in separate cells with the as in each cell.

You can now remove unwanted entries.

Shalvin Abraham
  • 428
  • 2
  • 8
  • The problem with that is the result would produce: "Main;1;Vaterra;VTR03014C-1.jpg. Which would cause three incorrect options and also the "Main|1|" isn't always that, sometimes it's "Vaterra|1|" or something else ending in "|1| so a simply search/replace wouldn't be effective. – ConduciveMammal Sep 24 '15 at 16:09
  • If you are only sure that you will need to capture all in the format (SomeText)/SomeAnother.jpg), you can use Regular Expressions. An example would be `([\w,\s-]+?)\\([\w,\s-]+\.jpg)`. That expression will give Vaterra as first match and filename as second match. Use this [Link](http://stackoverflow.com/questions/8146485/returning-a-regex-match-in-vba-excel) to know how to use Regular Expressions with Excel. – Shalvin Abraham Sep 24 '15 at 16:21
  • Hmm that sounds quite promising. Could this be automated to complete a list of 5,000 products? – ConduciveMammal Sep 24 '15 at 16:22
  • The magic of RegExp is that once it works, you can use the same expression against any text. If you have access to any text editor that supports regular expression text replace (Like Notepad++), you can just paste the records in the text editor and do regular expression replace. Please check updated answer. I just did in Notepad++. – Shalvin Abraham Sep 24 '15 at 16:30
  • that looks great! However, as they're all on their own line, wouldn't that copy them back into individual cells within Excel? – ConduciveMammal Sep 24 '15 at 16:35
  • You can use expression .*?([\w]+?)/([\w\s-]+?\.jpg).*? replace with $1/$2\n and it remove all except the folder name and file name, and will paste as seen in the new screenshot. You can now select All, Copy and then paste to excel. It will come in the same order. If you want only file name, use $2\n as replacement text. – Shalvin Abraham Sep 24 '15 at 16:44
0

As an alternate, here is a formula solution. Assuming the large single block of text is in cell A1, put this formula in cell B1 and copy down until it starts giving you errors:

=TRIM(MID(SUBSTITUTE("|"&$A$1,";",REPT(" ",LEN($A$1))),LEN($A$1)*(ROW(A1)-1)+1+LOOKUP(2,1/(MID(SUBSTITUTE("|"&$A$1,";",REPT(" ",LEN($A$1))),LEN($A$1)*(ROW(A1)-1)+ROW(INDIRECT("1:"&LEN($A$1))),1)="|"),ROW(INDIRECT("1:"&LEN($A$1)))),LEN($A$1)))

The errors mean that there are no more entries to return, so you can delete the cells with errors, and then select all the cells with the formula -> Copy -> Right-click -> Paste Special -> Values to convert them to just be text instead of formulas. (I highly recommend doing that because the Indirect function is volatile and can greatly slow down your workbook if you have many formula cells with it.)

tigeravatar
  • 26,199
  • 5
  • 30
  • 38
0

As you have VBA tag, here is a quickest VBA approach.
Assuming your your data is in column A starting from row 1 on sheet1.
This macro will write the below two lines in column B and C respectively.

Column B

Vaterra/VTR03014C-1.jpg

Column C

Vaterra/VTR03014C-2.jpg;Vaterra/VTR03014C-3.jpg;Vaterra/VTR03014C-4.jpg;Vaterra/VTR03014C-5.jpg;Vaterra/VTR03014C-6.jpg;Vaterra/VTR03014C-7.jpg;Vaterra/VTR03014C-8.jpg;Vaterra/VTR03014C-9.jpg;Vaterra/VTR03014C-10.jpg;|Vaterra/VTR03014C-11.jpg;Vaterra/VTR03014C-12.jpg;Vaterra/VTR03014C-13.jpg;Vaterra/VTR03014C-14.jpg

Here is the macro.

Public RegMatchArray
Sub test()

    Dim sh As Worksheet
    Dim rowCount As Long
    Dim i, j As Integer
    Dim strValue, strValue1, strValue2 As String

    Set sh = Sheets("Sheet1")

    rowCount = sh.Range("A1048576").End(xlUp).Row

    For i = 1 To rowCount
        strValue = sh.Cells(i, 1).Value
        If InStr(1, strValue, "Main|1|") > 0 Then
            strValue = Replace(strValue, "Main|1|", "")
        End If
        iPos = InStr(1, strValue, ";")
        strValue1 = Left(strValue, iPos - 1)
        strValue2 = Mid(strValue, iPos + 1, Len(strValue) - iPos - 1)
        Call splitUpRegexPattern(strValue2, "([\w\s-]+?)\/([\w\s-]+?\.jpg)")
        For j = LBound(RegMatchArray) To UBound(RegMatchArray)
            If j < 1 Then
                strValue2 = RegMatchArray(j)
            Else
                strValue2 = strValue2 & ";" & RegMatchArray(j)
            End If
        Next
        sh.Cells(i, 2).Value = strValue1
        sh.Cells(i, 3).Value = strValue2
    Next

    Set sh = Nothing
End Sub

Public Function splitUpRegexPattern(targetString, strPattern)
    Dim regEx As New RegExp
    Dim strReplace As String
    Dim arrArray()
    i = 0

    'CREATE THE REGULAR EXPRESSION
    regEx.Pattern = strPattern
    regEx.IgnoreCase = True
    regEx.Global = True

    'PERFORM THE SEARCH
    Set Matches = regEx.Execute(targetString)

    'REPORTING THE MATCHES COLLECTION
    If Matches.Count = 0 Then
        RegMatchArray = ""
    Else
        'ITERATE THROUGH THE MATCHES COLLECTION
        For Each Match In Matches
            'ADD TO ARRAY
            ReDim Preserve arrArray(i)
            arrArray(i) = Match.Value
            i = i + 1
        Next
        RegMatchArray = arrArray
        RegExpMultiSearch = 0
    End If

    If IsObject(regEx) Then
        Set regEx = Nothing
    End If
    If IsObject(Matches) Then
        Set Matches = Nothing
    End If
End Function

Note: You have to add "Microsoft VBSript Regular Expressions 5.5" reference by going into Tools -> References.

If you don't want to keep the original column A, change the below lines. This will delete the original data and give you the result in column A and B.

From:

sh.Cells(i, 2).Value = strValue1
sh.Cells(i, 3).Value = strValue2

To:

sh.Cells(i, 1).Value = strValue1
sh.Cells(i, 2).Value = strValue2

enter image description here

ManishChristian
  • 3,759
  • 3
  • 22
  • 50
  • Hi, thanks very much for that. It looks great, however there's an error in the macro. It produces an error message saying: "Run-time error '13': Type mismatch". Thanks. – ConduciveMammal Sep 25 '15 at 08:13
  • Also, since not all of them begin with " __Main__ |1|", could I simply replace `If InStr(1, strValue, "Main|1|"` with wildcards like: `If InStr(1, strValue, "*|1|"`. Thanks – ConduciveMammal Sep 25 '15 at 08:27
  • For the error try to debug the code and tell me which line is throwing the error. I've tested with given example, it's working fine. And about `Main|1|`, I have already covered that part, check the data in cell A2, it's not staring with `Main|1|`, yet cell B2 and C2 is giving the expected result. – ManishChristian Sep 25 '15 at 12:42
  • And make sure you have added `"Microsoft VBSript Regular Expressions 5.5"` reference by going into Tools -> References, – ManishChristian Sep 25 '15 at 12:43
  • Hey @Nelly27281, sorry about the delay. Office was out for the weekend. Okay, so I don't actually know how to debug this stuff, but when the error shows up, it highlights the line `Public Function splitUpRegexPattern(targetString, strPattern)` and I checked and found I have Regular Expressions 5.5 enabled. – ConduciveMammal Sep 28 '15 at 08:28
  • Hmm it seems to produce the error after highlighting in yellow `Public Function splitUpRegexPattern(targetString, strPattern)`. And select highlighting in blue `Dim regEx As New RegExp`. – ConduciveMammal Sep 28 '15 at 12:48
  • Something is wrong with your reference for `"Microsoft VBSript Regular Expressions 5.5"`, try to check that twice. – ManishChristian Sep 28 '15 at 14:42
  • Hmm I think it's a problem with my data. When I copy and paste just the above example, it works fine. But when trying it on the full spreadsheet, it shows the error and debug loops at `ReDim Preserve arrArray(i) arrArray(i) = Match.Value i = i + 1 Next` – ConduciveMammal Sep 28 '15 at 14:56
  • This means the regex pattern is not finding anything from your data. Can you post example of your actual data? – ManishChristian Sep 28 '15 at 15:16
  • Sure, here's a [link](http://1drv.ms/1iWqEb9) to my sheet. It contains around 4,000 files in A1 Sheet 1. Thanks. – ConduciveMammal Sep 28 '15 at 15:21
  • Place your expected result in cells `B1`, `C1`, etc.. That will help. – ManishChristian Sep 28 '15 at 15:26
  • Problem is with your data, as the image(image with path) name is different. You need universal pattern for regex which will return the match from different combination. – ManishChristian Sep 28 '15 at 17:29
  • Hmm yeah, I thought that might have caused the biggest issue. I'll see if I do a mass search and replace – ConduciveMammal Sep 28 '15 at 17:37