Some background
Unfortunately, XML is not a regular language, and hence you simply cannot reliably process it using regular expression—no matter how complex a regexp you will be able to come up with.
I would start with this brilliant humorous take on ths issue and then read, say, this.
To demonstrate, a simple change to your example which will break your processing could be, for instance, this:
<person id="13">
<name>
<first>John</first>
<last>Doe</last>
</name>
<age>42</age>
<Married>false</Married>
<City><![CDATA[Hanga <<Roa>>]]></City>
<State>Easter Island</State>
<!-- Need more details. -->
</person>
Actually, consider this
<last>Von
Neumann</last>
Why do you think you are free to drop the line feed from the contents of that element?
Sure, you'll say one cannot sensibly have a newline in their family name.
OK, but what about this?
<poem author="Chauser">
<strophe number="1"> The lyf so short,
the craft so long to lerne.</strophe>
</poem>
You cannot sensibly drop the whitespace between the two parts of that sentence—because to have it was the author's intent.
Well, OK, the full story is defined in the section called "White Space Handling" of the XML spec.
A layman's attempt to describe whitespace handling in XML is as follows:
The XML spec itself does not assign any special meaning to whitespace: the decision on what whitespace means in a paricular place of an XML document is up to the processor of that document.
By extension, the spec does not mandate whether whitespace between any "tags" (those <foo>
and </bar>
and <quux/>
things—appearing at points where XML markup is allowed) is significant or not: that's only you who decides.
To better understand the reason for this, consider the following document:
<p>␣Some text which contains an␣<em>emphasized block</em>
which is followed by a linebreak and more text.</p>
This is a perfectly valid XML, and I have replaced the space characters
right after the <p>
tag and right before the <em>
tag with the Unicode "open box" characters for display purposes.
Note that the whole text ␣Some text which contains an␣
appears between two tags and contains leading and trailing whitespace which is obviously significant — if it were not, the emphasized text (that marked up with the <em>…</em>
would be glued together with the preceding text).
The same logic applies to the line break and more text after the </em>
tag.
The XML spec hints at that it may be convenient to define "insignificant" whitespace to mean any whitespace between a pair of adjacent tags which do not define a single element.
XML also has two featrures which complicate processing further:
- Character entities (those
&
and <
thingies) allow direct insertion of any Unicode code point: for instance, 
would inset a line feed character.
- XML support special "CDATA sections", which your parser ostensibly knows nothing about.
An approach to the solution
Before we try to come up with a solution, we'll define what whitespace we intend to treat as insignificant, and drop.
Looks like with your kind of document, the definiton should be: any character data between any two tags should be deleted unless:
- it contains at least a singe non-whitespace character, or
- it completely defines the contents of a single XML element.
With these considerations in mind, we can write code which parses an input XML stream into tokens and writes them into the output XML stream, while applying the following logic to processing the tokens:
If it sees any XML element other than character data, it encodes them into the output stream.
Additionally, if that element was a start tag, it remembers this fact by setting some flag; otherwise the flag is cleared.
If it sees any character data, it checks to see whether this character data immediately follows a start element (an opening tag), and if so, this character data block is saved away.
The character data block is also saved when there are already such saved blocks present—this is needed because in XML, it's possible to have several adjacent but still distinct character data blocks in a document.
If it sees any XML element, and detects it has one or more saved character blocks, then it first decided whether to put them into the output stream:
If the element is an end element (the closing tag), all the character data block must be put into the output stream "as is"—because they completely define the contents of a single element.
Otherwise if at least one of the saved character data blocks contain at least a single non-whitespace character, all blocks are written into the output stream as is.
Otherwise all the blocks are skipped.
Here is the working code which implements the described approach:
package main
import (
"encoding/xml"
"errors"
"fmt"
"io"
"os"
"strings"
)
const xmlData = `<?xml version="1.0" encoding="utf-8"?>
<person id="13">
weird text
<name>
<first>John</first>
<last><![CDATA[Johnson & ]]><![CDATA[ <<Johnson>> ]]><![CDATA[ & Doe ]]></last>
</name>
 
	<age>
42
</age>
<Married>false</Married>
<City><![CDATA[Hanga <Roa>]]></City>
<State>Easter Island</State>
<!-- Need more details. --> what?
<foo> more <bar/> text </foo>
</person>
`
func main() {
stripped, err := removeWS(xmlData)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
fmt.Print(stripped)
}
func removeWS(s string) (string, error) {
dec := xml.NewDecoder(strings.NewReader(s))
var sb strings.Builder
enc := NewSkipWSEncoder(&sb)
for {
tok, err := dec.Token()
if err != nil {
if err == io.EOF {
break
}
return "", fmt.Errorf("failed to decode token: %w", err)
}
err = enc.EncodeToken(tok)
if err != nil {
return "", fmt.Errorf("failed to encode token: %w", err)
}
}
err := enc.Flush()
if err != nil {
return "", fmt.Errorf("failed to flush encoder: %w", err)
}
return sb.String(), nil
}
type SkipWSEncoder struct {
*xml.Encoder
sawStartElement bool
charData []xml.CharData
}
func NewSkipWSEncoder(w io.Writer) *SkipWSEncoder {
return &SkipWSEncoder{
Encoder: xml.NewEncoder(w),
}
}
func (swe *SkipWSEncoder) EncodeToken(tok xml.Token) error {
if cd, isCData := tok.(xml.CharData); isCData {
if len(swe.charData) > 0 || swe.sawStartElement {
swe.charData = append(swe.charData, cd.Copy())
return nil
}
if isWS(cd) {
return nil
}
return swe.Encoder.EncodeToken(tok)
}
if len(swe.charData) > 0 {
_, isEndElement := tok.(xml.EndElement)
err := swe.flushSavedCharData(isEndElement)
if err != nil {
return err
}
}
_, swe.sawStartElement = tok.(xml.StartElement)
return swe.Encoder.EncodeToken(tok)
}
func (swe *SkipWSEncoder) Flush() error {
if len(swe.charData) > 0 {
return errors.New("attempt to flush encoder while having pending cdata")
}
return swe.Encoder.Flush()
}
func (swe *SkipWSEncoder) flushSavedCharData(mustKeep bool) error {
if mustKeep || !allIsWS(swe.charData) {
err := encodeCDataList(swe.Encoder, swe.charData)
if err != nil {
return err
}
}
swe.charData = swe.charData[:0]
return nil
}
func encodeCDataList(enc *xml.Encoder, cdataList []xml.CharData) error {
for _, cd := range cdataList {
err := enc.EncodeToken(cd)
if err != nil {
return err
}
}
return nil
}
func isWS(b []byte) bool {
for _, c := range b {
switch c {
case 0x20, 0x09, 0x0d, 0x0a:
continue
}
return false
}
return true
}
func allIsWS(cdataList []xml.CharData) bool {
for _, cd := range cdataList {
if !isWS(cd) {
return false
}
}
return true
}
Playground.
I'm not sure it completely covers all possible weird cases but it should be a good start.