2

I am new to Ruby on Rails and need some advice on this! I trying to build a webscraper and have a JSON file here at Kimono Labs, where you can turn website data into API's:

https://www.kimonolabs.com/api/3obhv4p0?apikey=WWHHbKEkOmAPXsObOccPBXTb5NgRyCNO that I want to save to the database of my Ruby on Rails application.

Specifically it's the "results" key that I want to save to the database. It contains data from Google Finance with the company, URL to quote page, P/E and latest price. Which has the following format:

"results": {
    "collection1": [
      {
        "property1": {
          "href": "https://www.google.com/finance?catid=TRBC:57&sort=a&ei=Tx2WVonTG9uhe7Hpv_AN",
          "text": "Company"
        },
        "property2": "P/E (ttm)",
        "property3": "Quote",
        "index": 1,
        "url": "https://www.google.com/finance?catid=TRBC%3A57&sort=PE_RATIO&ei=6tyMVrqxIdaP0ASF0pbACQ"
      },
      {
        "property1": {
          "href": "https://www.google.com/finance?q=NASDAQ:NANO&ei=Tx2WVonTG9uhe7Hpv_AN",
          "text": "Nanometrics Incorporated"
        },
        "property2": "10,100.72",
        "property3": "14.04",
        "index": 2,
        "url": "https://www.google.com/finance?catid=TRBC%3A57&sort=PE_RATIO&ei=6tyMVrqxIdaP0ASF0pbACQ"
      },

This is the migration I have:

ActiveRecord::Schema.define(version: 20160108073353) do

  create_table "stocks", force: :cascade do |t|
    t.string   "company"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.string   "url"
    t.float    "pe"
    t.float    "quote"
  end

The question is how do I load this JSON file into my application? I think it has the following steps and therefore my question can be broken up into:

  • I need to open/load it with RestClient or OpenURI > how do I do this?
  • Parse the data and make it a Ruby hash > how do I do this?
  • Then loop through the hash and save data to database? > how do I do this?

Thanks for the help!

chemook78
  • 1,168
  • 3
  • 17
  • 38

1 Answers1

7

Yes, these are right steps.

First, you should create Stock model:

$ rails g model stock --skip

And then:

# get JSON
result = Net::HTTP.get(URI.parse('https://www.kimonolabs.com/api/3obhv4p0?apikey=WWHHbKEkOmAPXsObOccPBXTb5NgRyCNO'))

# parse JSON
json = JSON.parse(result)

# save data to DB
json['results']['collection1'][1..-1].each do |data| # [1..-1] ignores first dummy element
  Stock.create(
    company: data['property1']['text'],
    url: data['url'],
    pe: data['property2'].gsub(',', ''), # .gsub removes thousands separator
    quote: data['property3'].gsub(',', '')
  )
end
Inpego
  • 2,657
  • 13
  • 14
  • Thanks Inpego, that works! I actually want to update this table based on the JSON data on a regular basis. The table is a result of a stock screener so the list of stocks will change (company name, quote, P/E; but not the URL belonging to company name). I am thinking to run a Stock.delete_all every time and then the code above to load a new table. Is that the way to go? Or do a check for the "company" column, see if there are any duplicates with the new data and update P/E + quote and add new rows if the company doesn't exist yet. Should I do this with an if statement? Thanks for the advice! – chemook78 Jan 14 '16 at 03:50
  • The first way is more simple and most likely will perform better, I recommend to choose it therefore. I also recommend to use `Stock.connection.execute("TRUNCATE TABLE #{Stock.table_name}")` instead of `Stock.delete_all` for the sake of performance, in case you are not using external keys for this table. – Inpego Jan 14 '16 at 09:16
  • see http://stackoverflow.com/questions/139630/whats-the-difference-between-truncate-and-delete-in-sql – Inpego Jan 14 '16 at 10:38