-1

I have personal ID's in reports I have to find in one cell. Too bad the string in the cell which hides this ID can be anything, the ID can be at the beginning, the end, anywhere, but it is there. The only thing I know is the pattern "space,letter,letter,number,number,number,number,number,number,space". Jike DB544345

I was looking for the correct word for this "mask", but couldn't find an answer. Thank you for your help.

Dgloria
  • 11
  • 2
  • 6
  • ok, I'll check, thanks. Maybe we can solve this by finding all spaces, looking if the string after it is this length, and then check if it is the correct format? – Dgloria Feb 06 '15 at 05:37
  • @pnuts I agree with you that VBA is better for a non hacked solution (hence the comment upvote) but on your second comment... If you could split out those chunks between the spaces you could evaluate an 8 character chunk with an array formula that essentially converts each character based on UNICODE ranges. Capital letters to 2 and numbers to 1 and anything else to 0. Raise each number in that array to the exponent of its index in the string. Sum them up. And check that your answer equals 516. And yes I already wrote that formula. – Mark Balhoff Feb 06 '15 at 17:13
  • 1
    @pnuts The challenge of making this work in formulas intrigues me for some some reason. The hard part is combining the two steps: generating the chunks and performing the test. Generating an array of all possible chunks in one formula is not that hard. The test component is certainly possible in one formula. Combining the two gets quirky. – Mark Balhoff Feb 06 '15 at 17:16
  • If you don't mind VBA then follow the instructions here http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops and (if your data were in A2) enter =regex(A2,"[A-Z]{2}[0-9]{6}") – Tom Sharpe Feb 06 '15 at 17:23
  • 1
    @pnuts True that is simpler. For this specific simple pattern you can do a numeric check on most of it. Side note: I think often it's easy to say use code on a question like this but I find myself not pushing that case hard in Excel because even as a programmer I know the long-term issues that presents. Using formulas leads to lots of hacks around deficiencies and quirks in built-in functions though. – Mark Balhoff Feb 06 '15 at 17:37
  • @pnuts Haha yea I understood what you meant. This except broken out into different cells... `=AND(CODE(LEFT(G22,1))>=65,CODE(LEFT(G22,1))<=90,CODE(MID(G22,2,1))>=65,CODE(MID(G22,2,1))<=90,ISNUMBER(RIGHT(G22,6)*1))` – Mark Balhoff Feb 06 '15 at 17:48
  • @pnuts Of course. A little haphazard with the LEFT functions there. If the concatenate function could just put an array back into a string, we could do this whole evaluation on the original messy string... Hmmm – Mark Balhoff Feb 06 '15 at 17:56
  • 1
    @pnuts Looks like my subconscious didn't want to let go of this question... I think your method will find false matches if the numeric section is led with a dash or contains a period. – Mark Balhoff Feb 06 '15 at 21:13
  • Hi, I solved it in VBA, that works, with RegEx and the solution similar to Tom Sharpe's. – Dgloria Feb 07 '15 at 17:20

3 Answers3

1

As the comments are numerous I have created a minimal example that might represent what the OP is dealing with:

A1: 123456789 DB544345 asdfg asdfghjk
A2: creating dummy data is a DB544345 pain
A3: DB5443456 and soething else

parsed a copy of that in ColumnB with Text To Columns (with space as the delimiter) then applied:

=IFERROR(IF(AND(LEN(B1)=8,CODE(LEFT(B1))>64,CODE(LEFT(B1))<91,CODE(MID(B1,2,1))>64,CODE(MID(B1,2,1))<91,ISNUMBER(RIGHT(B1,6)*1),RIGHT(B1,6)*1>99999),B1,""),"")

to K1, copied this across to P1 and then K1:P1 down.

pnuts
  • 58,317
  • 11
  • 87
  • 139
  • This is a good answer. It's messy, requires a lot of space (if many space characters), and isn't live updating (for lack of better words) but **if the OP's main agenda is to extract an answer as fast as possible, this is the brute force way to do it**. Depending on the structure of the unseen OP structure and his desire or lack thereof to stand by it, that might be the best idea. For a less brute force solution, I have posted an alternative that seems to accomplish all of this in a single cell formula. – Mark Balhoff Feb 06 '15 at 22:47
  • 1
    Oh just one thing though. Shouldn't your `MID` functions be `MID(B1,2,1)` and not `MID(B1,1,1)`? – Mark Balhoff Feb 06 '15 at 22:53
0

A concise "built-in function only" solution to a problem such as this requires a bit of tinkering as many attempts will dead-end or need workarounds due to deficiencies and quirks in the built-in Excel formulas. I much prefer single cell formulas because they minimally affect the general spreadsheet structure. However, due to the limitations listed above, complex single cell solutions often come at the cost of being rather long and cumbersome (this answer is somehow still only two lines on my formula bar in Excel). I came back to your question and cobbled together a formula that can (as far as I have tested) extract the first occurrence of this pattern with a single cell formula. This is an array formula (Ctrl+Shift+Enter instead of Enter) that assumes your data is in A2. This rough formula returns the first 8 characters if no match is found and throws #REF if the string is shorter than 10 characters.

=MID(A2,MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9))),1)=" ",IF(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+9,1)=" ",IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))>64,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))<91,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))>64,IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))<91,IF(IFERROR(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+3,6)*1>99999,FALSE),ROW(INDIRECT("A1:A"&(LEN(A2)-9)))))))))))+1,8)

Let me try to break this down at least on a high level. We are splitting the main text into every possible ten character chunk so that we can test each one using the suggestion of @pnuts to verify the Unicode values of the first two characters and run an ISNUMBER check on the rest of the string. This first block recurs throughout my formula. It generates a list of numbers from 1 to n-9 where n is the length of our main text string.

ROW(INDIRECT("A1:A"&(LEN(A2)-9)))

Let's assume our string is 40 characters long and replace the above formula with {1...31}. Using this number sequence generation we can check if characters 1 to 31 are spaces:

IF(MID(A2,{1...31},1)=" "

Then we can check if characters 10 to 40 are spaces:

IF(MID(A2,{1...31}+9,1)=" "

Then we can check if characters 2 to 32 are capital letters:

IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))>64,
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+1,1))<91

Then we can check if characters 3 to 33 are capital letters:

IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))>64,
IF(CODE(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+2,1))<91

Then we can check if the strings of characters 4 to 9, 5 to 10, ..., 33 to 38, 34 to 39 are six-digit numbers:

IF(IFERROR(MID(A2,ROW(INDIRECT("A1:A"&(LEN(A2)-9)))+3,6)*1>99999,FALSE)

If all conditions are TRUE, that 10 digit chunk test will return the index of its first character in the string via another instance of the original array {1...31}. Otherwise it returns nothing. We take the Min of all return indexes and then use the Mid function to grab the 8 digit string determined by the aforementioned minimum index:

=MID(A2,MIN(matching index list)+1,8)
Mark Balhoff
  • 2,311
  • 4
  • 22
  • 30
  • @pnuts I am on a 17" laptop so yes the screen is wide. I took him literally when he said "space,letter,letter,number,number,number,number,number,number,space" that there was always a trailing space. I could modify it to accommodate the ID being the last 8 characters of string by a couple methods. Actually the easiest would be to go 1 to n-8 (instead of n-9) and then put an IFERROR(my current check for trailing space, TRUE). Untested but I think that would work. – Mark Balhoff Feb 06 '15 at 23:13
  • @pnuts Your proposed string has no valid match in it. You only have 5 numbers there. Dynamic! Yes! That's the word that evaded me! – Mark Balhoff Feb 06 '15 at 23:22
  • @pnuts I can address strings with no valid answer multiple ways. The most trivial way is to grab `TRIM(MID(A2,MIN(matching index list),9))` instead of `MID(A2,MIN(matching index list)+1, 8)`. The latter causes the 0 returned in case of no valid finds to turn into a start index of 1 for the `MID` function. – Mark Balhoff Feb 06 '15 at 23:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/70435/discussion-between-mark-balhoff-and-pnuts). – Mark Balhoff Feb 06 '15 at 23:37
  • 1
    @pnuts I certainly understand that. I don't mind. Not sure why I kept messing with it. – Mark Balhoff Feb 06 '15 at 23:46
  • I think if I have to do it in excel now, I would put a new worksheet, take this cell content, take everything out which is delimited by a space, and look for the pattern in each cell. Then I'd throw the result back to the original table. but VBA was perfect for it. (So far). – Dgloria Feb 07 '15 at 17:30
  • @pnuts in what sense do you find this "such a vague question"? – Dgloria Feb 08 '15 at 09:35
  • corrected thanks. Anyway I was looking for the excel solution as well, I work with these ID's regularly and usually they are hidden somewhere in csv, txt or any other files. Can't help, this is my job. – Dgloria Feb 10 '15 at 14:53
0

I think this will work, if we assume that the SPACE at the beginning and end are merely to differentiate the ID from the rest of the string; hence would not be present if the ID were at the beginning or end of the string. This formula is case insensitive. If case sensitivity is required, we could do character code comparisons.

=LOOKUP(2,1/((LEFT(myArr,2)>="AA")*(LEFT(myArr,2)<="ZZ")*(LEN(myArr)=8)*ISNUMBER(-RIGHT(myArr,6))),myArr)

Where myArr refers to:

=TRIM(MID(SUBSTITUTE(TRIM(Sheet2!A1)," ",REPT(" ",99)),(ROW(INDIRECT("1:10"))-1)*99+1,99))

If myArr is initially defined with the cursor in B1, referring to A1 as shown, it will adjust to refer to the cell one column to the left of the cell in which the Name appears.

The 10 in 1:10 is the maximum number of words in the string -- can be adjusted if required.

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60