3

Problem: I am working with 2 list. One called HYPHEN and one called CAS Number in columns A and B respectively.

Column C uses a formula that combines column A and B and sorts them such that if a hyphen is present in column A, this is inserted before the adjacent CAS number which is then inserted below and the sequence continues so that all hyphens and CAS numbers are included. I've attached an image to better explain this and the Formula to replicate this is given below.

CAS number being incorrectly converted to another number when Sorted (highlighted in red)

A CAS number is a unique Identify for a material/chemical and usually is written as 000-00-0, however occasionally you get materials with CAS numbers of 0000-00-0 (or other variations).

For the most part column C is correct because all but one CAS numbers are in the usual format. However As highlighted in red 6132-04-3 is being converted to 1545801.

What I have tried: I have realised that 6132-04-3 is being converted to 03/04/6132 so I'm pretty sure that this is being recognised as a date which is causing the problem. I have tried to format the cells to all be a text format, I have added a comma before the CAS number but nothing returns the desired value of 6132-04-3 and instead always returns 1545801.

To replicate the issue: Column A and B can have any data entered. To replicate the output of column C the formula is given below:

Formula for Column C: =FILTERXML(""&SUBSTITUTE(TEXTJOIN(",",TRUE,A2:B26),",","")&"","//b")

(Formula provided by @Gary's Student on Stack Overflow)

Any thoughts on how to prevent the red CAS number being converted when it is sorted in Column C would be really appreciated.

JvdV
  • 70,606
  • 8
  • 39
  • 70
Nick
  • 789
  • 5
  • 22

3 Answers3

3

This is a crude way to fix it by adding then removing an arbitrary character:

=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN(",",TRUE,IF(A2:B26="","","x"&A2:B26)),",","</b><b>")&"</b></a>","//b"),2,99)

enter image description here

If you have the issue of some of your strings containing a comma, just use a different separator:

=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN("|",TRUE,IF(A2:B26="","","x"&A2:B26)),"|","</b><b>")&"</b></a>","//b"),2,99)
Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
  • Hi Thanks for this. Out of interest I'm also having an issue when I use a chemical name such as 3,4-dimethylbenzaldehyde. The probelm is that the data is split because of the comma into 3 and 4-dimethylbenzaldehyde into respective cells. I'm really not familiar with with writing complex excel formula like this but it would be perfect if you knew how to achieve this too. No worries if not. – Nick Feb 06 '21 at 16:09
  • TBH I'm fairly new to Filterxml and it could be that there is a more elegant way of solving your original problem. Anyway, I think the answer to your comment is fairly easy, just use a different separator character e.g. a filter symbol instead of a comma. Will add to my answer. – Tom Sharpe Feb 06 '21 at 16:18
1

Looks like you could use:

enter image description here

Formula in D2:

=SUBSTITUTE(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),"'","")

Or:

=MID(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),2,99)
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • @iNick1, your welcome. Fun little question too. If you want to know more about `FILTERXML()`, you could have a look over [here](https://stackoverflow.com/q/61837696/9758194). – JvdV Feb 06 '21 at 16:49
  • If you have a moment. Could you briefly explain what was going on before? I do want to get better at excel but at this stage its quite a learning curve. – Nick Feb 06 '21 at 17:06
  • 1
    @iNick1, `FILTERXML()` returns an array of values from an valid xml-string that we previously stringed together using `TEXTJOIN()`. What Excel does (IMO annoyingly) is trying to process each of the elements like if you were using VBA and each variable would be a variant type variable by default. In other words, Excel will try to understand the data and assign a data type itself. To prevent this mechanism we stringed in an `'` to prevent Excel from recognizing these elements as being numeric (therefor we also prevent the autoprocessing and formatting to unwanted figures). – JvdV Feb 06 '21 at 17:12
0

I can suggest you this:

Go to Format Cells ---> Number ---> Custom ---> Type

In this "Type" field write this #000-00-0

Press "OK"

Dharman
  • 30,962
  • 25
  • 85
  • 135
Ashgabat
  • 71
  • 1
  • 3