0

Currently, I am grabbing titles using the following method:

title = html_response[/<title[^>]*>(.*?)<\/title>/,1]

This does a great job at catching "This is a title" from <title>This is a title</title>. However, there are some web pages that open the title tag on one line, print the title on the next line, and then close the title tag.

The Ruby line I presented above doesn't catch titles such as those, so I'm just trying to find a fix for that.

halfer
  • 19,824
  • 17
  • 99
  • 186
LewlSauce
  • 5,326
  • 8
  • 44
  • 91

2 Answers2

4

This famous stackoverflow post explains why it's a bad idea to use regular expressions to parse HTML. A better approach is to use a gem like Nokogiri to parse out the title tags.

Community
  • 1
  • 1
Mori
  • 27,279
  • 10
  • 68
  • 73
1

Obligatory don't use regex with HTML sentence.

title = html_response[/<title[^>]*>(.*?)<\/title>/m,1]

The m enables multiline mode.

cfeduke
  • 23,100
  • 10
  • 61
  • 65