1

I am making a cfhttp call and getting the data back..

Now I am getting a complete page like below:

<html><title>MyPage</title><head><link rel="stylesheet" href="style.css"></head>
<body>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
</body>
</html>

Now the issue I want the code which which is inside the body tag, and also remove the last table tag completely.

I am not sure where to start [p.s JSOUP is not an option]

tried like below but it did not yielded any results:

<cfset objPattern = CreateObject("java","java.util.regex.Pattern").Compile(JavaCast("string","(?i)<table[^>]*>([\w\W](?!<table))+?</table>"))>  
    <cfset objMatcher = objPattern.Matcher(JavaCast( "string", cfhttp.FileContent ))> 
    <cfoutput>#objMatcher#</cfoutput>
Neb9
  • 55
  • 11
voyeger
  • 139
  • 2
  • 9
  • 2
    *JSOUP is not an option* Why not? – Leigh Dec 03 '14 at 22:19
  • because first thing, i do not know jsoup. 2. my client does not want to use external service. – voyeger Dec 03 '14 at 22:22
  • If it's well formed (meaning there are closing tags etc) you can treat it as if it were an XML document using xmlParse(). – Mark A Kruger Dec 03 '14 at 22:25
  • its not completely valid html, so the xmlparse breaks in between, i come to leigh, i am not sure how i will convince client, but jsoup can help in the above – voyeger Dec 03 '14 at 22:34
  • You may want to look at: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – James A Mohler Dec 04 '14 at 15:45

1 Answers1

2

As far as convincing the client, explain that while regular expressions are great for some jobs, they are really not the best tool for parsing html. JSoup is not an external service. It is a pre-built library designed specifically for this task (unlike regular expressions).

JSoup is very simple to use, and similar to working with javascript's DOM. Just add the JSoup jar to your class path (restart if needed) and it is ready to use.

I want the code which which is inside the body tag, and also remove the last table tag completely.

Load the html content into a Document object and grab the <body> element:

jsoup = createObject("java", "org.jsoup.Jsoup");
doc = jsoup.parse( yourHTMLContentString );
body = doc.body();

Use a selector to grab and remove the last <table> element:

elem = doc.select("table:last-of-type");
elem.remove();

That is it. Now you can print, or do whatever you want, with the <body> element's html:

writeOutput( HTMLEditFormat(body.html()) );

See their documentation for more information. In particular, the JSoup Cookbook has some very good examples.

Leigh
  • 28,765
  • 10
  • 55
  • 103
  • i downloaded jsoup-1.8.2.jsr from site and i placed in cfusion/wwwroot/web-inf/lib/jsoup-1.8.2.jsr, but the initiation does not work for that, iused the code u provided – voyeger Dec 05 '14 at 16:49
  • "Does not work" is very vague. Are you getting an error? My guess would be you did not restart the CF server as noted above. Though if you are using CF10+, you can [load jars dynamically in your Application.cfc](http://help.adobe.com/en_US/ColdFusion/10.0/Developing/WSe61e35da8d318518-106e125d1353e804331-7ffe.html) instead. – Leigh Dec 05 '14 at 16:55
  • Here i added directly in the coldfusion folder, no need to calling this.javasettings: `C:\ColdFusion10\cfusion\wwwroot\WEB-INF\lib` and calling it like this ` jsoup = CreateObject("java", "org.jsoup.Jsoup"); HTMLDocument = jsoup.parse(CFHTTP.fileContent); TheTable = HTMLDocument.select("body"); writeOutput(TheTable.toString()); `. The file is stored as **jsoup.jar** – voyeger Dec 05 '14 at 17:10
  • i did not restarted the server yet, `Error Class not found: org.jsoup.Jsoup Object Instantiation Exception` – voyeger Dec 05 '14 at 17:11
  • Like I said, you *have to* restart the CF server, or the jars won't be detected. Though again, as of CF10+ you can simply use `this.javaSettings`. That option does not require a restart. – Leigh Dec 05 '14 at 17:24
  • but with the tag you told me, i am unclear how to load the path becoz i am not providing any array, it is just an single jar, can you guide me in some example, because where it is placed, i feel confused as to how to load it – voyeger Dec 05 '14 at 17:26
  • as i had added the file already in the lib folder as i told above, i added the following path in application.cfc `` – voyeger Dec 05 '14 at 17:28
  • No. Those two options are mutually exclusive. You either a) Add the jar to ..\web-inf\lib\ **AND** restart the CF server OR b) You add the absolute path to the jar to `this.javaSettings`. Do not do both. – Leigh Dec 05 '14 at 17:29
  • still same error: ` Class not found: org.jsoup.Jsoup Object Instantiation Exception` – voyeger Dec 05 '14 at 17:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66281/discussion-between-leigh-and-voyeger). – Leigh Dec 05 '14 at 17:32