0

Smart guys I need some help.

The question is I have two values 1) HTML source 2) A JSON value, I need to do a regex and check if the json value is present in the HTML source. But even though the exact same value exists my test fails

Below is the actual data

After using JSON:XS I obtained my desired value which is a blob of text job description -

Responsible for providing single point of contact to borrowers whose loans are delinquent or at high risk for delinquency. Assist borrowers to find solutions to maintain home ownership, including HAMP, proprietary modifications, RPPs, etc. If retention options are not possible, discuss other solutions including short sales, deeds in lieu, etc. Responsible for analyzing the customer's financial situation and making recommendations on loan modifications/workout options to resolve delinquency. Identify, maintain, track and log requested documentation for loan modification review. Notify customers of loan modification decision, current status, options, timelines, coordination touch points, and customer obligations through the process. 

The primary goal of the Home Preservation Department is to mitigate loss for Wells Fargo Home Mortgage while allowing the mortgagor(s) to retain their home. Maintain applicable compliance conformity as it relates to data integrity and risk management. Ability to handle confidential material in a professional, highly ethical manner. 

Duties include (not limited to): 
- Answer inbound inquiries from borrowers regarding status of the loss mitigation, loan 
 modification, short sale and foreclosure process. 
- Interview borrower to understand borrowers specific situation; identify and request the 
 appropriate documents required for loan modification review 
- Access information on multiple systems to inform borrowers of loan terms, important 
 dates and deadlines. 
- Communicate and coordinate with multiple Servicing departments (both internal and 
 external to WF) informing the borrower of current status, options, next steps; remain as 
 the single point of contact to borrower throughout the loss mitigation, loan modification, 
 and foreclosure process 
- Ensure necessary information and complete packages are received in a timely manner, 
 maintained, tracked, and logged in multiple systems. 
- Place outbound calls to notify borrowers when additional loan document information is 
 required; follow up on loan docs 
- Notify and inform borrowers of decisions as they are made. 
- Notify and inform borrowers of changes to established dates/timelines throughout 
 process. 
- Work with borrower to set up work out payment plans. 
- Work with borrower to set up and follow up on good faith payments. 
- Work with borrower to get loan modifications signed and closed. 
- Work with borrower to obtain broker information for short sale process. 
- Provide information to borrowers on the escalation/complaint procedures/process within 
 WF. 
Ensure borrowers understand their obligations throughout each step of process, including timelines and coordination touch points with other Servicing partners. Effectively Work in a team environment in an effort to achieve team/volume goals and provide superior customer service.3+ years experience in mortgage loan origination, telesales, collections, default, customer service or loss mitigation- Excellent Pipeline Management and Organization 
- Experience building relationships/rapport with customers 
- Ability to analyze credit documents and solve problems. 
- Ability to prioritize and maintain a large volume of work 
- Ability to be flexible and adapt to a fast paced and changing business environment. 
- Strong attention to detail with excellent written and verbal communication skills. 
- Working Knowledge of MS Word and Excel 
- Demonstrate ability to lead train, and provide feedback to staff and corresponding 
 manager. 
- Ability to prioritize and coordinate multiple tasks including distribution of work within a 
 specified department 
- Demonstrated Pipeline Management Skills 
- Excellent Analytical and Problem Solving Skills 
- Strong Customer Service Skills and ability to take escalated calls 


*
*
*Shift Hours: 10am-7pm 
*
*
*- Associates Degree or BA/BS 
- Experience managing a pipeline of work 
- Experience discussing loan documentation over the phone (requesting and collecting 
 docs) 
- Experience interviewing customers, knowing when to probe for details 
- Experience working with disgruntled customers 
- Data entry into multiple systems of record 

And a snippet of the HTML source which contains the text with the tags appears as

</div>

</div>
<p>Responsible for providing single point of contact to borrowers whose loans are delinquent or at high risk for delinquency. Assist borrowers to find solutions to maintain home ownership, including HAMP, proprietary modifications, RPPs, etc. If retention options are not possible, discuss other solutions including short sales, deeds in lieu, etc. Responsible for analyzing the customer's financial situation and making recommendations on loan modifications/workout options to resolve delinquency. Identify, maintain, track and log requested documentation for loan modification review. Notify customers of loan modification decision, current status, options, timelines, coordination touch points, and customer obligations through the process. </p>

<p>The primary goal of the Home Preservation Department is to mitigate loss for Wells Fargo Home Mortgage while allowing the mortgagor(s) to retain their home. Maintain applicable compliance conformity as it relates to data integrity and risk management. Ability to handle confidential material in a professional, highly ethical manner. </p>

<p>Duties include (not limited to): 
<br />- Answer inbound inquiries from borrowers regarding status of the loss mitigation, loan 
<br /> modification, short sale and foreclosure process. 
<br />- Interview borrower to understand borrowers specific situation; identify and request the 
<br /> appropriate documents required for loan modification review 
<br />- Access information on multiple systems to inform borrowers of loan terms, important 
<br /> dates and deadlines. 
<br />- Communicate and coordinate with multiple Servicing departments (both internal and 
<br /> external to WF) informing the borrower of current status, options, next steps; remain as 
<br /> the single point of contact to borrower throughout the loss mitigation, loan modification, 
<br /> and foreclosure process 
<br />- Ensure necessary information and complete packages are received in a timely manner, 
<br /> maintained, tracked, and logged in multiple systems. 
<br />- Place outbound calls to notify borrowers when additional loan document information is 
<br /> required; follow up on loan docs 
<br />- Notify and inform borrowers of decisions as they are made. 
<br />- Notify and inform borrowers of changes to established dates/timelines throughout 
<br /> process. 
<br />- Work with borrower to set up work out payment plans. 
<br />- Work with borrower to set up and follow up on good faith payments. 
<br />- Work with borrower to get loan modifications signed and closed. 
<br />- Work with borrower to obtain broker information for short sale process. 
<br />- Provide information to borrowers on the escalation/complaint procedures/process within 
<br /> WF. 
<br />Ensure borrowers understand their obligations throughout each step of process, including timelines and coordination touch points with other Servicing partners. Effectively Work in a team environment in an effort to achieve team/volume goals and provide superior customer service.3+ years experience in mortgage loan origination, telesales, collections, default, customer service or loss mitigation- Excellent Pipeline Management and Organization 
<br />- Experience building relationships/rapport with customers 
<br />- Ability to analyze credit documents and solve problems. 
<br />- Ability to prioritize and maintain a large volume of work 
<br />- Ability to be flexible and adapt to a fast paced and changing business environment. 
<br />- Strong attention to detail with excellent written and verbal communication skills. 
<br />- Working Knowledge of MS Word and Excel 
<br />- Demonstrate ability to lead train, and provide feedback to staff and corresponding 
<br /> manager. 
<br />- Ability to prioritize and coordinate multiple tasks including distribution of work within a 
<br /> specified department 
<br />- Demonstrated Pipeline Management Skills 
<br />- Excellent Analytical and Problem Solving Skills 
<br />- Strong Customer Service Skills and ability to take escalated calls </p>

<p>*
<br />*
<br />*Shift Hours: 10am-7pm 
<br />*
<br />*
<br />*- Associates Degree or BA/BS 
<br />- Experience managing a pipeline of work 
<br />- Experience discussing loan documentation over the phone (requesting and collecting 
<br /> docs) 
<br />- Experience interviewing customers, knowing when to probe for details 
<br />- Experience working with disgruntled customers 
<br />- Data entry into multiple systems of record 
<br />- Experience warm transferring customers 
<br />- Strong computer/data entry skills 
<br />- Knowledge of the underwriting, Short Sale, Foreclosure, and Processing processes. 
<br />- HAMP and Non-HAMP product knowledge 
<br />- Loss Mitigation or Mortgage Experience</p>
<h4>
Compensation
</h4>
<p>
Unspecified
</p>
<div class="clear"></div>
<div class="detail_actions">
<a href="#" class="actionlink maction btn_apply xlarge" data_maction="apply" title="Apply">Apply</a>
</div>

<div class="clear"></div>
</div>
<div class="search_bar_wrap3">

- Experience warm transferring customers 
- Strong computer/data entry skills 
- Knowledge of the underwriting, Short Sale, Foreclosure, and Processing processes. 
- HAMP and Non-HAMP product knowledge 
- Loss Mitigation or Mortgage Experience

But when I tried it with my code it does not work -

$actual_production_html =~ s/<.+?>//g; #remove Html tags from HTML source
ok($actual_production_html =~ m/\Q$full_description/); #Regex of the JSON text with the HTML source

Any help will be deeply appreciated

Amey
  • 8,470
  • 9
  • 44
  • 63
  • What's the actual question here? What are you trying to do? – cdeszaq Sep 16 '11 at 19:33
  • The question is I have two values one is an HTML source and the second is a JSON value, I need to do a regex and check if the json value is present in the HTML source. – Amey Sep 16 '11 at 19:36
  • but even though the exact same text exists, my test says NOT OK... – Amey Sep 16 '11 at 19:38
  • Are there whitespace differences? Maybe one of your expressions uses `"\r\n"` for line endings and the other one uses `"\n"`? Maybe there are embedded nulls (`"\0"`) in the data? – mob Sep 16 '11 at 19:52
  • They probably are - I thought if I use "\Q" in my perl regex expression "ok($actual_production_html =~ m/\Q$full_description/);" it should cater to that – Amey Sep 16 '11 at 19:56
  • Regexps can't effectively parse HTML. You could go mad with all the special cases. It has happened here before. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Paul Sep 16 '11 at 22:06
  • Yes i was going through other stories on stackoverflow, and all recommended using an HTML parser module, but I am still not sure how would I be able to leverage that module, as my JSON value is too big, and not a line or two of text to regex with. Any ideas? – Amey Sep 16 '11 at 22:10

2 Answers2

0

The way you're going about it is too sensitive to slight formatting differences between the two.

I would suggest that you "normalize" both text streams (the input and the HTML with tags removed) by collapsing all strings of multiple whitespace (including newlines, tabs, etc) down to a single whitespace. Then compare the two normalized text strings. If you don't care about lettercase, then also fold everything to upper or lower case.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
0
even though the exact same value exists my test fails

is not true. You do NOT have "exact same value"

Your JSON has 2 blank lines after

Strong Customer Service Skills and ability to take escalated calls

After removing tags, your HTML has one blank line there.

So failing to match is what is supposed to happen!

tadmc
  • 3,714
  • 16
  • 14