-1

This is for a project for a very specific use.

I am trying to find how to find any empty text from the xml and replace it with a message.

regex = re.compile(r'>\s*</')


replaced = re.sub(regex, ">[!] JSON value does not exist, Check your Json!</", temp)

For example filename is blank

        <file>               
            <fileType>Mezza</fileType>
            <fileName></fileName>
            <segments>0000</segments>
        </file>

and the output would be:

         <file>               
            <fileType>Mezza</fileType>
            <fileName>[!] value does not exist!</fileName>
            <segments>0000</segments>
        </file>

However I am getting for other parts where there is white space and newline I dont want to have this message, both tag names are different there is a new line and they are closing tags how do I implement this in regex?:

</fileName>[!] value does not exist!</file>
Emma
  • 27,428
  • 11
  • 44
  • 69
Jim
  • 101
  • 3
  • 13

2 Answers2

0

Maybe, another option here we might be having would be to just find filename, and replace what we desire inside the tag, with likely an expression such as:

(<fileName>)(.+)?(<\/fileName>)

if I understand the problem correctly.

Demo 1

If we have fully empty tags, this expression might likely work:

(>)()(<\/)

Demo 2

If we will be having empty tags and tags with horizontal spaces, then we would be expanding it to:

(>)(|[^\S\r\n]+)(<\/)

Demo 3

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(<fileName>)(.+)?(<\/fileName>)"

test_str = ("        <file>               \n"
    "            <fileType>Mezza</fileType>\n"
    "            <fileName></fileName>\n"
    "            <segments>0000</segments>\n"
    "        </file>")

subst = "\\1[!] value does not exist!\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Thanks Emma for the reply, but filename is one example. There are other tags, that is the issue. – Jim May 31 '19 at 17:47
0

Use [ \t]* instead of \s*. This will match spaces and tabs, but not newlines. So the code should be:

regex = re.compile(r'>[ \t]*</')

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612