0

I need to extract the encoding charset from a web page, i found that this regex syntax can do it with different tag syntax

(?<=([<META|<meta])(.*)charset=)([^"'>]*)

In general How to make this regular expression syntax works within gsub in R

SalimK
  • 360
  • 1
  • 3
  • 18
  • 2
    Did you try to use it with `gsub()`? What exactly is the problem? What is your definition of "work" in this case? You should make a clear [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. – MrFlick Jan 21 '16 at 02:29
  • 1
    `stringr::str_match(rvest::html_attr(rvest::html_nodes(xml2::read_html("http://symbolcodes.tlt.psu.edu/web/tips/declare.html"), xpath="//meta[@http-equiv]"), "content"), "charset=(.*)")[,2]` :: Rule of Scraping #34 - When you can use proper parsing, use it. – hrbrmstr Jan 21 '16 at 02:53

0 Answers0