2

I want to use Ruby (maybe its File class) to grab/change certain content in a html file. For example:

<html>
<script>
...
</script>
<!-- example -->
<div class="1">
<div class="2">
....
<p>...</p>
</div>
...
</html>

So can I use ruby to run through all the htmls in the folder and change all the html's content to include only <!-- example -->.*?<\/div> in each file?

NOTE:wants to add more clarification: not just copy the text, but the code too. I may grab the code content from <!-- example --> to <\/div>

Thank you and I'm looking forward to your reply!

JJJ
  • 32,902
  • 20
  • 89
  • 102
Penny
  • 1,218
  • 1
  • 13
  • 32
  • 1
    See this [similar question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). You will probably need an XML/HTML __parser__. (Do ___not___ use regexp; unless you know what you're doing and since you're asking this question, you're probably not there yet. Every time an engineer uses regexp to parse HTML kittens die.) – franklin Mar 01 '16 at 22:49
  • hi I think maybe the questions are a bit different? But I just want to change the html file only contains the content from to . Can't I use ruby to do it? I'm just thinking using copy and paste would be too much of the manual work if I have tons of htmls to do the same task. – Penny Mar 01 '16 at 22:54
  • Of course you can! Any automated stuff should be done by machine not by hand. However, you're approaching the problem in the wrong way. You should be using an HTML parser to read an HTML source file. Vanilla regexp is not a sophisticated enough tool to read HTML. – franklin Mar 01 '16 at 23:00
  • 1
    I believe that the X/HTML parser you're looking for (in Ruby) may be Nokogiri. – franklin Mar 01 '16 at 23:02
  • Thank you! I just downloaded the Nokogiri and reading its documentation, but still wondering if I can use it somehow to change the html's content to only contain part of the code...I can definitely learn it from scratch, but if you have any ideas of how to select the content and delete the rest for each html, please let me know? Thanks! – Penny Mar 01 '16 at 23:21
  • I'm not fluent in Ruby, but you can probably adapt the sol'n found [here](http://stackoverflow.com/a/610684/778694). Just wrap that in a recursive function that goes through your directory looking for html files. – franklin Mar 02 '16 at 01:49

0 Answers0