13

I have data in an Excel spreadsheet with delimited strings. For simplicity, think of something like "4#8#10", with an arbitrary number of pieces.

Is there any way to split this into an array for use with an array formula? For example, something like SUM(SplitStr(A1,"#")). (This does not work - it just returns "4".)

Is there a good way of approaching this? My ideal solution would end up with an array, as if the user entered {4,8,10} for use in array formulas.

The application of my sheet does not allow for delimiter-based text to columns and summing that. I would prefer to not use VBA if possible as I will not be the primary consumer of the sheet.

Thanks, Terry

Gahan
  • 4,075
  • 4
  • 24
  • 44
terry87
  • 445
  • 1
  • 4
  • 15
  • 1
    Have you tried LEFT, RIGHT, MID, SEARCH combined? I don't quite understand what output you expect, so can't direct to a proper answer – Inox Aug 14 '14 at 19:29
  • Does the string contain any symbols or specific set of symbols (# in this case)? – Ravi Yenugu Aug 14 '14 at 19:31
  • Inox - yes, though the issue was transforming it to an array for summation, which is necessary when you consider an arbitrary number of pieces in the string. The expected sum of that is 22. (See Ron's answer.) Thinkingcap - Yes, the string is delimited. In this case # but often char(10). – terry87 Aug 16 '14 at 17:30

8 Answers8

26

Use XML functionality:

={SUM(FILTERXML("<t><s>" & SUBSTITUTE(A1, "#", "</s><s>") & "</s></t>", "//s"))}

Update April 2022: Use new TEXTSPLIT function.

Vafā Sarmast
  • 361
  • 3
  • 6
  • 1
    Nice. +1 for this. – QHarr May 16 '18 at 06:10
  • this should be the accepted answer since it does not rely on a constant like in the accepted answer (what if the string is more than 99 chars ?). It is complete dynamic then – laloune Apr 01 '20 at 10:21
  • 1
    Note the curly braces cannot be entered manually, the formula must be entered without them ( i.e. =SUM(...) ) Then typing CTRL+SHIFT+ENTER will indicate the expression is an array formula, and the braces appear. – plasmo Jul 27 '20 at 06:28
24

To sum the entries 4,8,10 you could use something like:

=SUMPRODUCT(1*TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99)))

The array that is returned is a text array, so the 1* at the beginning is one way to convert them to numbers

This part:

TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99))

returns the array:

{"4";"8";"10"}

And with 1* before it:

{4;8;10}

Edit After six years, more succinct formulas are available to create the array:

With Excel O365 and the SEQUENCE function:

=1*(TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),IF(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)=1,1,(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)-1)*99),99)))

With Excel 2010+ for Windows (not MAC versions) with the FILTERXML funcdtion:

=FILTERXML("<t><s>" & SUBSTITUTE(A1,"#","</s><s>") & "</s></t>","//s")

Note that the FILTERXML function extracts the values as numbers, whereas the first formula extracts the numbers as text which must be converted to numbers before being used in a numeric formula.

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • That is wicked clever. You are my new Excel god. Question: is the Trim necessary? (Doesn't the 1* serve the same purpose?) Thanks! – terry87 Aug 16 '14 at 17:19
  • This is great, but I cannot work out how it is working :) How can I apply it to another cell different from A1? Can you elaborate a bit more on how the formula is working, can;t work out how the array is generated... – Simone Jun 04 '15 at 13:32
  • 1
    @Simone To apply to a cell other than A1, you would change the reference to A1 within the formula, to some other cell. To understand how the array is generated, I would suggest you examine the second formula in my response using the formula evaluation tool. – Ron Rosenfeld Jun 06 '15 at 00:46
  • This doesn't seem to be working for me -- I enter the formula exactly as shown, but the returned array is {"";"";"10"}. Any ideas why @RonRosenfeld? – Austin Wismer Aug 23 '18 at 03:44
  • @AustinWismer There is something different about either how you entered the formula, or your data, from what the question assumes. Maybe extra spaces someplace? But without knowing **exactly** what you did, I can't say. – Ron Rosenfeld Aug 23 '18 at 10:58
  • I use the TRIM... function without multiplication with 1 to get the array returned in text format to the next function. When following the calculation steps it seams like MID(...) return an array when it is within SUMPRODUCT but it return only the first item in the array when SUMPRODUCT is removed. Do you have an idea how to change that behaviour? – Matthias Tidlund Jan 14 '19 at 15:55
  • @MatthiasTidlund Please post your problem as a new question. Comments are suboptimal in trying to help with this kind of issue. – Ron Rosenfeld Jan 15 '19 at 01:39
  • I have now formulated my comment as a separate but similar question. Here is a link for those who can be interested in that question as well [How to use a string from a single cell as an array in a function without VBA](https://stackoverflow.com/questions/54220420/how-to-use-a-string-from-a-single-cell-as-an-array-in-a-function-without-vba)! – Matthias Tidlund Jan 16 '19 at 15:37
  • I claim absolutely no credit for this, but in latest Excel it can be made even more succint: `=LET(in;L1;n_sub;SEQUENCE(1+LEN(in)-LEN(SUBSTITUTE(in;";";"")));1*TRIM(MID(SUBSTITUTE(in;";";REPT(" ";99));(n_sub-1)*99+1;99)))` – martin Mar 12 '21 at 08:49
3

Regarding Ron Rosenfeld's answer and Austin Wismer's question as to why it wasn't working for him, I spent the good part of an hour breaking it down and trying to figure out what makes it work(I think).

The OFFSET part of the function is taking the cell with the text that you want to split(my examples):

"Haz|Tank|Doub"

and duplicating it into an array column, with the height specified by taking the total number of characters in the cell

LEN("Haz|Tank|Doub") = 13 characters

and subtracting it taking the total number of characters in that cell, when the delimiter is replaced by empty space ""

LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) =
LEN("HazTankDoub") = 11 characters

Which gives the number of delimiters in the cell being referenced:

LEN("Haz|Tank|Doub") - LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) = 2

Adding 1 to this:

LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))**+1))** 

13 - 11 + 1 = 3

This gives the number of separate text strings that we want split up. The result is the OFFSET function duplicates the cell by 3(in my example) into separate rows

"Haz|Tank|Doub" "Haz|Tank|Doub" "Haz|Tank|Doub"

or

OFFSET("Haz|Tank|Doub",,,LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))+1))

OFFSET("Haz|Tank|Doub",,,3) = {"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}

Which is then fed into the ROW function, which gives the row number for each cell in a range, and this is where it gets tricky.

So in my case, the Row # of the first cell in a list of cells with text split by delimiters starts at R2, which OFFSET then duplicates 3 rows down, so when I feed this into the ROW function, I get:

ROW({"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}) = {2;3;4}

or

ROW({$R$2:$R$4}) = {2;3;4}

Seems fine, but then this is subtracted by 1

(ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1)

{2;3;4} - 1 = {1;2;3}

Ok, then that is multiplied by 99

(ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1) * 99

{1;2;3} * 99 = {99;297;396}

After this, the formula adds it to this

+((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)

Which is almost the same as the last part, except, it does not subtract 1 AND it checks if any of these results will = 1. This is done because the assumption is that the row number of the cell you're trying to split up into an array is 1. So for now, sticking with my numbers, this is what happens:

((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)   
((ROW(OFFSET($R$2,,,13 - 11 + 1)))=1)
((ROW(OFFSET($R$2,,,3)))=1)
((ROW({$R$2:$R$4}))=1)
(({2;3;4})=1)
(({FALSE;FALSE;FALSE})

{99;297;396} + ({FALSE;FALSE;FALSE})
{99;297;396} + ({0;0;0})
{99;297;396}

This is fed into the MID function as the start number to extract the text from, which is the number of characters into the text string to extract the text from. The text the MID function looks through is the cell we want split, however it first substitutes the delimiter with 99 " " empty spaces to separate them out.

MID(SUBSTITUTE($R$2,"|",REPT(" ",99)) = "Haz                                Tank                           Doub"

not exactly 99 spaces between each above, just wanted to show a representation

So our start numbers for each of the 3 rows in the array are {99;297;396}

So the MID function is going to start at 99 characters into the above text string, and extract 99 characters after that for the first row. Then it'll start at 297 characters in, and extract 99 characters after this.

MID(SUBSTITUTE($R$2,"|",REPT(" ",99)),{99;297;396},99)
MID("Haz                            Tank                           Doub",{99;297;396},99)

MID("Haz                            Tank                           Doub",{99},99)
MID("Haz                            **|Tank                           |**Doub",{99},99)
Tank

MID("Haz                            Tank                           Doub",{297},99)
MID("Haz                            Tank                           **|Doub               
     |**",{297},99)
Doub

MID("Haz                            Tank                           Doub",{396},99)
MID("Haz                            |Tank                          Doub",{396},99)
""

On the last one, it starts 396 characters in and goes 99 character past that, which contains nothing, and returns nothing.

Now if I my cell started on row $R$1, then it would do this:

ROW({$R$1:$R$3}) = {1;2;3}

(ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1)

{1;2;3} - 1 = {0;1;2}

(ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1) * 99

{0;1;2} * 99 = {0;99;198}

{0;99;198} + ((ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1)))=1)   
{0;99;198} + ((ROW(OFFSET($R$1,,,13 - 11 + 1)))=1)
{0;99;198} + ((ROW(OFFSET($R$1,,,3)))=1)
{0;99;198} + ((ROW({$R$1:$R$3}))=1)
{0;99;198} + (({1;2;3})=1)
{0;99;198} + (({TRUE;FALSE;FALSE})
{0;99;198} + {1;0;0}
{1;99;198}

And this, fed into MID, would correctly start at 1 character in, and extract everything 99 characters to the right of it:

MID("Haz                            Tank                           Doub",{1;99;198},99)
{Haz;Tank;Doub}

So all of that said, I substituted 2 things in the code so that it'll do it regardless of where you start in the sheet:

TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
(ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1))-(ROW([@Endorsements])))*99+
((ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1)))=(ROW([@Endorsements]))),99))

Please note that [@Endorsements] is $R$2 for me, it's just the cell is in a named Table, and @ corresponds to the same row in the table, but another named column, so I think $R2 could be substituted and still work

TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
(ROW(OFFSET($R2,,,LEN(R$2)-LEN(SUBSTITUTE($R2,"|",""))+1))-(ROW($R2)))*99+
((ROW(OFFSET($R2,,,LEN($R2)-LEN(SUBSTITUTE($R2,"|",""))+1)))=(ROW($R2))),99))

Basically instead of subtracting 1, it subtracts the row number of whatever the row of the cell is that you want to split up. Not much of a change at all, so full credit still goes to Ron, but I figured I'd share it since I just figured it out.

2

With your data in A1, in B1 enter:

=TRIM(MID(SUBSTITUTE($A1,"#",REPT(" ",999)),COLUMNS($A:A)*999-998,999))

and copy across.

Gary's Student
  • 95,722
  • 10
  • 59
  • 99
  • This is an interesting formula to simulate text to columns, but since the structure of the spreadsheet does not allow for text-to-columns, unfortunately this doesn't work for the case in the post. Thanks! – terry87 Aug 16 '14 at 17:32
  • Yeah! it extract **4** form **4#8#10** – Learner Jan 11 '16 at 07:32
1

I was using Ron's answer from above, but it was giving me a blank space with larger lists and I didn't like the limited size for each datum, so I made a similar formula based on his that doesn't have these issues.

=MID($A$1,IFERROR(FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,0)))+1,1),FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,1)))-IFERROR(FIND(" ",SUBSTITUTE($A$1,"#"," ",SEQUENCE(LEN($A$1)-LEN(SUBSTITUTE($A$1,"#","")),,0))),0)-1)
Dharman
  • 30,962
  • 25
  • 85
  • 135
Carble
  • 39
  • 3
  • I was trying to test @Carbie you solution, for the following text: `a;c;d;b` replacing `#`by `;`but it doesn't return the last letter `b`. – David Leal Oct 15 '22 at 20:26
0
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1,"#","</s><s>")&"</s></t>","//s"),3,1)

so you can refer for each instance without having an array formula.in this one you refer to the 3rd

AndrewK
  • 79
  • 3
0

Here an approach based on @Carble answer that doesn't work for the input: a#b#c#d because it returns {a;b;c} missing the last item. Here a solution that fixes this problem and tries to simplify the solution via LET function (O365 version 2021). At the same time was released SEQUENCE, which is used on previous approach and on this one. This approach is useful in case your Excel version doesn't support TEXTPLIT or FILTERXML (solutions provided here in other answers).

According to the documentation of FILTERXML it has the following limitations: The function is not available in Excel for the web and Excel for Mac. This function may appear in the function gallery in Excel for Mac, but it relies on features of the Windows operating system, so it will not return results on Mac.

Here is the solution in cell A2, where the delimiter (del) and the input text (txt) are defined as names of the LET function for easier maintenance of the formula:

=LET(txt,A1,del, ";",length, LEN(txt),
 itemsNum, length-LEN(SUBSTITUTE(txt,del,""))+1,
 seq, SEQUENCE(itemsNum,,0), findSpaces, FIND(" ",SUBSTITUTE(txt,del," ",seq)),
 startTxt, IFERROR(findSpaces+1,1),startDel1, IFERROR(findSpaces,0),
 startDel2, IFERROR(FIND(" ",SUBSTITUTE(txt,del," ",seq+1)), length+1),
MID(txt,startTxt,startDel2-startDel1-1)
)

Here is the output for several input values. As you can see it works for various scenarios even for an empty string where the result is an empty string too:

sample excel file

Explanation

The main idea is to use MID(text, start_num, num_chars) so all intermediate calculations named via LET function are intended to identify start_num and num_chars as column arrays, so it returns the corresponding split array.

We designate additional names in LET to reuse some calculations. It also helps for explaining the intermediate results in this section.

itemsNum, length-LEN(SUBSTITUTE(txt,del,""))+1

is the way to obtain how many items we need to split (number of del's plus 1), where length is defined as: LEN(txt).

For explanation purpose of the calculations we are going to use as reference the input for txt: a;bb;ccc;dddd. Here is the corresponding output:

length itemsNum
13 4

seq name is the corresponding 4x1 array sequence: {0;1;2;3}.

Important: In order for this solution to work, the previous sequence needs to be a vertical array, not an horizontal array (1x4). It ensures other functions used here (SUBSTITUTE, FIND, MID) return a vertical array too. The question is looking for an array solution, so it is important to keep this in mind.

The result of SUBSTITUTE(text, old_text, new_text, [instance_num]) used in both FIND calls:

SUBSTITUTE(txt,del," ",seq)) SUBSTITUTE(txt,del," ",seq+1)
#VALUE! a bb;ccc;dddd
a bb;ccc;dddd a;bb ccc;dddd
a;bb ccc;dddd a;bb;ccc dddd
a;bb;ccc dddd a;bb;ccc;dddd (no spaces)

It adds a space ( ) on each sequence of instance_num (fourth input argument) which needs to be a positive number. Therefore for 0 SUBSTITUTE returns #VALUE!. We deal with this case later.

The corresponding output of FIND calls for previous input arguments are:

findSpaces, FIND(" ",SUBSTITUTE(txt,del," ",seq))
startDel2, IFERROR(FIND(" ",SUBSTITUTE(txt,del," ",seq+1)), length+1)
findSpaces startDel2
#VALUE! 2
2 5
5 19
9 14

For startDel2, IFERROR replaces #VALUE! in the last row (no spaces in the substring: a;bb;ccc;dddd) with length+1 (=14). We can do that, because if the input argument num_chars from MID reaches or passes the end of the string, it takes as the number of characters up to the end of the string.

The following variables startTxt, startDel1 are calculated based on findSpaces name:

startTxt, IFERROR(findSpaces+1,1)
startDel1, IFERROR(findSpaces,0)

will produce the following output:

startTxt startDel1
1 0
3 2
6 5
10 9

where startTxt represents the starting position of each substring after the delimiter. For the first row FIND returns #VALUE! but it was replaced via IFERROR with 1 that represents the start of the string (txt), so it can be used as the second input argument (start_num) of MID.

startDel1, replaces the #VALUE! via IFERROR with 0 for the first row.

startDel2-startDel1-1 represents the number of characters of each substring:

startDel2-startDel1-1
1
2
3
4

so now we have all the elements needed to invoke the MID function:

MID(txt,startTxt,startDel2-startDel1-1)
David Leal
  • 6,373
  • 4
  • 29
  • 56
-1

Just in case someone is looking to do the same but with Google Sheets, here's how I do it

Let's say A1 contains 4#8#10

=ARRAYFORMULA(SPLIT(A1, "#"))

That will create an array, and display the array entry in columns.

If you need the array to be shown in rows, you can use

=ARRAYFORMULA(TRANSPOSE(SPLIT(A1, "#")))

I wish there was a SPLIT formula in MS Excel..

m-p-3
  • 351
  • 4
  • 6