2

Hello can someone please tell me how to extract text bettwen two dots.

Example: goldintre.cr.usgs.gov.

I want all text from first dot to the last(without dots at the begining and at the end)

This must be done without removing last dot from source text.

Thank you

JvdV
  • 70,606
  • 8
  • 39
  • 70

3 Answers3

2

Caveat: All below possible answers assume at least two dots in your values! If that is not the case, you need to include and IFERROR() for Excel where LibreCalc will show empty.


LibreCalc

enter image description here

Formula in B1:

=REGEX(A1;"(?:^[^.]*\.|\.[^.]*$)";"";"g")

A little explaination of the pattern:

  • (?: - Open non-capturing group.
    • ^[^.]*\. - Start string ancor, followed by zero or more characters other than a literal dot, followed by a dot.
    • | - Or.
    • \.[^.]*$ - A literal dot, followed but zero or more characters other than a literal dot followed by end string ancor.
    • ) - Close non-capturing group.

As per this online demo, you can tell that the replacement will now "delete" the marked text.

For your information, the semi-colon is the standard seperator in LibreCalc (and is ofter confused by a comma which is standard seperator for Excel).


Excel 2019/O365

enter image description here

Formula in B1:

=TEXTJOIN(".",,FILTERXML("<t><s>"&SUBSTITUTE(A1,".","</s><s>")&"</s></t>","//s[position()>1][position()<last()]"))

Though it's a longer alternative to the nice use of LET() when you would have Excel O365, this works in Excel 2019 also and makes use of TEXTJOIN() and some XPATH expressions that means:

  • //s - Return all "s" nodes.
  • [position()>1] - Position of the node is larger than 1st index AND:
  • [position()<last()] - Position of the node is smaller than the last index.

For more information on that construct, I think you can look at this. Previous post.

Note that you can also use this in LibreCalc but you'd need to use a modified installment since the default function FILTERXML() has a bug.


Excel in general:

For older installments of Excel you could use:

=MID(A1,FIND(".",A1)+1,FIND("|",SUBSTITUTE(A1,".","|",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))-FIND(".",A1)-1)

Basically we first determine how many dots are in the string, we then SUBSTITUTE() the very last one in a pipe-symbol (or any unique character for that matter) before we can FIND() it's position. If we determine the position of the first dot it's a simple matter of calculation when we provide the start and length parameters in MID().


EDIT

After a better explaination from OP's end, it seems his string values will always end on a dot. In such a case you could simply REPLACE() the first part of a string:

=REPLACE(LEFT(A1,LEN(A1)-1),1,FIND(".",A1),"")

It would also make working in LibreCalc a little easier too:

=REGEX(A1;"^[^.]*\.(.*).$";"$1";"g")
JvdV
  • 70,606
  • 8
  • 39
  • 70
1

If you have Excel 365 then with data in cell A1 use:

=LET(x,FIND(".",A1)+1,y,LEN(A1),MID(A1,x,y-x))

enter image description here

(this assumes that there is always a dot at the end)

Gary's Student
  • 95,722
  • 10
  • 59
  • 99
  • I am growing very fond of the `LET()` function in O365. – JvdV Dec 08 '20 at 13:43
  • Beware your formula does not really work well though, it does currently *not* remove the last dot (and potential characters after the last dot, e.g.: `goldintre.cr.usgs.gov.test` yields `cr.usgs.gov.tes`). The idea is good though =) – JvdV Dec 08 '20 at 14:01
  • @JvdV You are correct! The formula assumes a ***.*** at the end. – Gary's Student Dec 08 '20 at 14:03
1

You have not provided sufficient details for your data as well as desired output. Let's consider all possibilities then.

Case-1 You always have a . in end, and you want text between first and this last dot. Use this formula -

=MID(A1, SEARCH(".", A1)+1, LEN(A1)-SEARCH(".", A1)-1)

This will give you cr.usgs.gov as output.

Case-2 You need not consider last dot and remove text before that one (assuming again that last dot in last place)

=MID(A1,SEARCH(".",A1)+1, SEARCH("$",SUBSTITUTE(A1,".", "$", LEN(A1)-LEN(SUBSTITUTE(A1, ".", ""))-1))-SEARCH(".",A1)-1)

This will give you cr.usgs as output (also assuming that $ is nowhere present in any of your text strings.

Other cases When you do not have dots in last places everywhere. For this you'll have to tweak your formula a little bit.

=MID(A1,SEARCH(".",A1)+1, SEARCH("$",SUBSTITUTE(A1,".", "$", LEN(A1)-LEN(SUBSTITUTE(A1, ".", ""))))-SEARCH(".",A1)-1)

This will give you output cd.ef if your input (A1) is ab.cd.ef.ghi. In the case-2 formula I just removed extra -1 which was used to exclude last dot from the text string.

Note This formula will work in all versions of excel.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • Thank you formula that you provided worked. Im sory for not providing enough information. Case 1 was the answer I was looking for. :D – SparkleMuffin Dec 08 '20 at 14:20
  • Right, so this was much more simpler than initially thought =). Good answer AnilGoyal, upvoted. For your interest I also included a little bit of a shorter solution that would avoid a double search @KarloHorvat. – JvdV Dec 09 '20 at 08:59
  • Yes, if there's a dot in the end, the solution gets easier. – AnilGoyal Dec 09 '20 at 09:17
  • @KarloHorvat, I think all the answers should get an upvote by you because all are correct. – AnilGoyal Dec 09 '20 at 09:18