1

We have some form data which flows from our CMS into our CRM. The data captured in the CRM contains HTML elements like this.

<dl><dt>First Name:</dt><dd>Joe</dd>
<dt>Last Name:</dt><dd>Blogs</dd>  
<dt>Personal Email:</dt>
<dd>joe.bloggs@exampleEmail.com</dd>  
<dt>Most recent job title:</dt>
<dd>Manager</dd>  
<dt> Level of membership you are applying for:</dt>
<dd>Full Member</dd>

I've played around with substrings but can only get as far as the first name. Shorter name either with the next string or longer names need more character length.

substring(fieldname, 29, 15) as FirstName,

Results

  1. Joe (Too long)
  2. Anne-Mar (too short)

Is there another way or way of capturing between specific text?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Punter
  • 31
  • 4
  • Does this answer your question? [Remove HTML tags from record](https://stackoverflow.com/questions/2627940/remove-html-tags-from-record) – P.Salmon Feb 03 '22 at 15:27
  • Thanks - Don't want to remove the tags per se just need each question and answer in a separate column. – Punter Feb 03 '22 at 15:34
  • 1
    It's doable but will get ridiculously complicated by the time you get past even the 2nd field. – Mike Feb 03 '22 at 16:05
  • I found another way but it only half works substring_index(substring_index(content, 'First Name:
    ', -1),'
    Last Name:
    ', 1) Does present correctly the starting place of the firstname but the entire entry follows after. Have the same results if I choose a later question.
    – Punter Feb 03 '22 at 16:09
  • 1
    MySQL is not the place to be doing HTML parsing. Read the data into whatever language you're using, and then parse the HTML with an HTML parser. You do NOT want to be messing around with substrings. Extracting text from HTML is a solved problem. Don't reinvent the wheel. – Andy Lester Feb 03 '22 at 16:36

1 Answers1

1

As stated in the comments, you are better off doing this at pretty much any level other than the database, but I've been playing around in SQLFiddle and came up with this as a starting point. The only thing I did to your original example was add the /dl tag to the end to make it somewhat well formed xml. I was pleasantly surprised at how well MySQL handled it.

set @xml = '<dl><dt>First Name:</dt><dd>Joe</dd>
<dt>Last Name:</dt><dd>Blogs</dd>  
<dt>Personal Email:</dt>
<dd>joe.bloggs@exampleEmail.com</dd>  
<dt>Most recent job title:</dt>
<dd>Manager</dd>  
<dt> Level of membership you are applying for:</dt>
<dd>Full Member</dd></dl>';

select ExtractValue(@xml, '//dt[1]') as question, ExtractValue(@xml, '//dd[1]') as answer
  union
select ExtractValue(@xml, '//dt[2]') as question, ExtractValue(@xml, '//dd[2]') as answer
  union
select ExtractValue(@xml, '//dt[3]') as question, ExtractValue(@xml, '//dd[3]') as answer
  union
select ExtractValue(@xml, '//dt[4]') as question, ExtractValue(@xml, '//dd[4]') as answer;

Results

question answer
First Name: Joe
Last Name: Blogs
Personal Email: joe.bloggs@exampleEmail.com
Most recent job title: Manager

(This is definitely a starting point. I'm not a MySQL expert and I'm very new to posting, so please feel free to improve this.)

maru
  • 156
  • 6