0

I am trying to detect the urls from a text and replace them by wrapping in quotes like below:

original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"

original text show my input value and required text represents the required output. I searched a lot on web but could not find any possible solution. I already have tried URL.extract feature but that doesn't seem to detect URLs without http or https. Below are the examples of some of urls I want to deal with. Kindly let me know if you know the solution.

ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.

https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/

www.jstor.org/stable/24084454

www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/

insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so

www.cerege.fr/spip.php?page=pageperso&id_user=94

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mani
  • 2,391
  • 5
  • 37
  • 81
  • so is your logic for this any string which has a `domain.root_domain` ? – lacostenycoder Apr 24 '19 at 16:31
  • @lacostenycoder kindly check the updated text. thanks – Mani Apr 24 '19 at 16:35
  • 1
    Matching any arbitrary URL is not easy, you will certainly get false positives if you want to also match strings like `googl.com`. Please check [this thread](https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string) for ideas on what pattern suits your scenario best. Once you know what pattern is good to use the rest is easy. – Wiktor Stribiżew Apr 24 '19 at 16:43

1 Answers1

0

Find words who look like urls:

str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"

str.split.select{|w| w[/(\b+\.\w+)/]}

This will give you an array of words which have no spaces and include a one or more . characters which MIGHT work for your use case.

puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94

Updated

Complete solution to modify your string:

str_with_quote = str.clone # make a clone for the `gsub!`

str.split.select{|w| w[/(\b+\.\w+)/]}
   .each{|url| str_with_quote.gsub!(url, '"' + url + '"')} 

Now your cloned object wraps urls inside double quotes

puts str_with_quote

Will give you this output

ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.

"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"

"www.jstor.org/stable/24084454"

"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"

"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"

"www.cerege.fr/spip.php?page=pageperso&id_user=94"
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48