0

I am getting an error importing an XML file into a custom program. Other files import correctly. However, one file produces an error from a float field. I am using Notepad++ search function with Regular Expression to try and find the issue in the XML file.

When I use <milepost>([a-zA-Z0-9.]+)</milepost> I get around 30,000 results which is the correct number of records but the field is supposed to be DOUBLE. When I use <milepost>([0-9.]+)</milepost> I only get 29,994 records. This tells me that the import is most likely failing because there are letters in my number fields.

I have tried a number of variations like:

<milepost>([\S\D\d]+)</milepost>
<milepost>(.*?)</milepost>
<milepost>([\Sa-zA-Z]+)</milepost>
<milepost>([0-9.\w]+)</milepost>

etc. Each of these returns the expected 30,000 records.

When I try to search for letters using :

<milepost>([a-zA-Z.]*)</milepost>
<milepost>([a-zA-Z]+)</milepost>
<milepost>(^[a-zA-Z]+$)</milepost>
<milepost>([a-zA-Z.a-zA-Z]+)</milepost>

I get 0 results (most likely because it excludes numbers)

I did manage to find one of the records I am looking for using this method:

<milepost>173.811818181818a</milepost>

But I do not feel like scrolling through 30,000+ lines to look for 5 more records with a letter in them.

Is there a regular expression that will return to me ONLY the values that have a letter/letters in them while allowing numbers? (Fields with only numbers and a period should be excluded)

N3R4ZZuRR0
  • 2,400
  • 4
  • 18
  • 32
  • Your question is a bit hard to follow. Can you show a sample of 10 lines and point out things which should match and things which should not? – MonkeyZeus Sep 09 '19 at 17:17
  • 2
    [You can't parse X|HTML with regex.](https://stackoverflow.com/a/1732454/1422451) – Parfait Sep 09 '19 at 17:20
  • @Parfait That's not particularly helpful. You can fairly successfully search a well-formed XML document using Regex if you are willing to accept that you may not catch all edge cases. – MonkeyZeus Sep 09 '19 at 18:50
  • Thank you for your input. I do understand that you can not parse through XML with RegEx, it was the only thing I could think of to find my errors. I literally have 200 miles of Lat/Long coords in XML along a specific track with mileposts. Monkey is correct in that something may be missed, While the answer below was exactly what I was looking for, my file still errors so it's back to the drawing board. – Obsidian Silence Sep 10 '19 at 19:00

2 Answers2

0

What you want is a negative look-ahead. Something like

<milepost>(?![0-9.]+</milepost>)

should be very close.

In plain English <milepost> not followed by exclusively digits and dots and a closing </milepost>

Tomalak
  • 332,285
  • 67
  • 532
  • 628
0

The 6 problem records presumably contain a mixture of letters and numbers, but your searches for records containing letters will only match records consisting exclusively of letters.

Try

<milepost>.*[a-zA-Z].*</milepost>

which matches any record containing an ASCII letter in its value, as well as allowing other characters such as digits.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164