0

I need help understanding how to implement this part of PHP code in Ruby. Before thinking, "OMG! THIS IS A LOT OF CODE," I just want to let you know the tiny section below is the relevant code to the question, but I included more below as I have the tendency to not include important facts in my questions (I'm a noob.) This script is for a SERP checker with the aim to teach me to program :

        ob_start();
        include_once($fetch_url);
        $page = ob_get_contents();
        ob_end_clean();  

        $page = str_replace('<b>','',$page);
        $page = str_replace('</b>','',$page);
        //preg_match('/008000\">(.+)<\/font><nobr>/i', $page, $match);
        preg_match_all('/<font color=#008000>(.*)<\/font>/', $page, $match);
        $r = 0;
        $position = '0';

My Ruby code is as follows:

def clean_up_keywords(str)
  str.gsub("\n", ",").delete("\r").split(',')
end

def clean_up_list(arr)
  arr.reject(&:empty?).each(&:lstrip!)
end

def make_strings_url_friendly(arr)
  arr.each do |e|
    e.gsub!(" ", "+")
  end
end

def make_urls(arr)
  arr.map {|e| "http://www.google.com/search?num=100&q=" + e}
end

post '/ranked' do
  dirty_list = clean_up_keywords(params[:keyword])
  clean_list = clean_up_list(dirty_list)
  url_ready_list = make_strings_url_friendly(clean_list)
  url_list = make_urls(url_ready_list)
end

The entire PHP script can be found here: http://pastie.org/1899806

The entire Ruby script can be found here: https://github.com/MelanieS/RankyPanky/blob/master/lib/rankypanky.rb

My deal is that I was told I don't really have to implement the output buffer part because it's Ruby, which is great for me because I can't make heads or tails as to what it is even after several people explaining it to me. (Someday)

However, in the output buffer section, the $page variable is created. It is then used in the next section where it appears that it is removing bold. Does my Ruby script already take care of this?

Then, the SERP checker appears to be looking for results with that font color -- and then what? putting them in an array called $match?

I was thinking, instead of having my code search for a font color, to have it search for the tag in the SERPs as it appears to be the only place where Google uses the cite thing... because the font tag type of search seems kind of deprecated to me.

I'm hoping that any one of you can tell me whether or not I am understanding this PHP code correctly and can give me a hint or two as to how to implement it in Ruby. My main issue is really knowing which elements of the PHP to NOT use since that whole output buffer thing has me baffled. Anything that points me in the right direction is much appreciated.

Also, in the original PHP code, it makes the Google urls like this (pseudocode):

"http://www.google.com/search?num=50&q=" +keyword+ "&btnG=Search"

But in my Ruby, I just made it like this:

"http://www.google.com/search?num=50&q=" +keyword

Does not adding the "&btnG=Search" to the end of the url make a difference? When I manually enter either url into my browser, it takes me to the same place, but I am unsure whether, programmatically, it makes a difference.

Melanie Palen
  • 2,645
  • 6
  • 31
  • 50
  • 1
    there's only output buffering in the php code because the writer is to stupid for file_get_contents() –  May 14 '11 at 10:43
  • Thank you, it's really hard to rewrite this, there were a lot of other big, big problems in the original code that my friend found and weeded out for me. I'll look up file_get_contents() and see what it's all about. – Melanie Palen May 14 '11 at 10:47
  • 1
    *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) - the code you show is **not** how you should do it in PHP. Use Nokogiri or Hpricot in Ruby. – Gordon May 14 '11 at 10:47
  • 1
    @Gordon - Thank you, I am taking a look at nokogiri right now. :) – Melanie Palen May 14 '11 at 10:49

1 Answers1

2

It pulls the page into a variable, strips all the bold tags, then puts all the green text from the page (without the coloring) into the $matches array.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358