0

I have a program using the spreadsheet gem to create a CSV file; I have not been able to find the way to configure the functionality that I need.

This is what I would like the gem to do: The model number and additional_image field should be "in sync", that is, each additional image written to the spreadsheet doc should be a new line and should not be wrapped.

Here are some snippets of the desired output in contrast with the current. These fields are defined by XPath objects that are screen scraped using another gem. The program won't know for sure how many objects it will encounter in the additional image field but due to business logic the number of objects in the additional image field should mirror the number of model number objects that are written to the spreadsheet.

model
168868837a
168868837a
168868837a
168868837a
168868837a 
168868837a 

additional_image
1688688371.jpg
1688688372.jpg
1688688373.jpg
1688688374.jpg
1688688375.jpg
1688688376.jpg

This is the current code:

require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"

LOCAL_DIR = 'data-hold/images'

 FileUtils.makedirs(LOCAL_DIR) unless File.exists?LOCAL_DIR
 Capybara.run_server = false
 Capybara.default_driver = :selenium
 Capybara.default_selector = :xpath
 Spreadsheet.client_encoding = 'UTF-8'

 class Tomtop
   include Capybara::DSL

   def initialize
     @excel = Spreadsheet::Workbook.new
     @work_list = @excel.create_worksheet
     @row = 0
   end

   def go
     visit_main_link
   end

   def retryable(options = {}, &block)
      opts = { :tries => 1, :on => Exception }.merge(options)

      retry_exception, retries = opts[:on], opts[:tries]

      begin
        return yield
      rescue retry_exception
        retry if (retries -= 1) > 0
      end

      yield
    end

   def visit_main_link
     retryable(:tries => 1, :on => OpenURI::HTTPError) do
     visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
     results = all("//h5/a[contains(@onclick, 'analyticsLog')]")
     item = []

     results.each do |a|
       item << a[:href]
     end
     item.each do |link|
          visit link
          save_item
      end
     @excel.write "inventory.csv"
    end

   end

    def save_item
      data = all("//*[@id='content-wrapper']/div[2]/div/div")
      data.each do |info|
        @work_list[@row, 0] = info.find("//*[@id='productright']/div/div[1]/h1").text
        price = info.first("//div[contains(@class, 'price font left')]")
        @work_list[@row, 1] = (price.text.to_f * 1.33).round(2) if price
        @work_list[@row, 2] = info.find("//*[@id='productright']/div/div[11]").text
        @work_list[@row, 3] = info.find("//*[@id='tabcontent1']/div/div").text.strip
        color = info.all("//dd[1]//select[contains(@name, 'options')]//*[@price='0']")
        @work_list[@row, 4] = color.collect(&:text).join(', ')
        size = info.all("//dd[2]//select[contains(@name, 'options')]//*[@price='0']")
        @work_list[@row, 5] = size.collect(&:text).join(', ')
        model = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
        @work_list[@row, 6] = model.gsub!(/\D/, "")
        @work_list[@row, 7] = File.basename(info.find("//*[@id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
        additional_image = info.all("//*[@rel='lightbox[rotation]']")
        @work_list[@row, 8] = additional_image.map { |link| File.basename(link['href']) }.join(', ')  
        images = imagelink.map { |link| link['href'] }
        images.each do |image|
          File.open(File.basename("#{LOCAL_DIR}/#{image}"), 'w') do |f|
            f.write(open(image).read)
         end

       end
       @row = @row + 1
     end

   end

 end


 tomtop = Tomtop.new
 tomtop.go

I would like this to do two things that I'm not sure how to do:

  1. Each additional image should print to a new line (currently it prints all in one cell).
  2. I would like the model field to be duplicated exactly as many times as there are additional_images in the same new line manner.
jcuwaz
  • 187
  • 3
  • 14

1 Answers1

1

Use the CSV gem. I took the long way of writing this so you can see how it works.

require 'csv'

DOC = "file.csv"
profile = []
profile[0] = "model"

CSV.open(DOC, "a") do |me|
me << profile
end 


img_url = ['pic_1.jpg','pic_2.jpg','pic_3.jpg','pic_4.jpg','pic_5.jpg','pic_6.jpg']

a = 0
b = img_url.length
while a < b
 profile = []
 profile[0] = img_url[a]

 CSV.open(DOC, "a") do |me|
 me << profile    
 end

 a += 1
end

The csv file should look like this

model
pic_1.jpg
pic_2.jpg
pic_3.jpg
pic_4.jpg
pic_5.jpg
pic_6.jpg

for your last question

whatever = []
whatever = temp[1] + " " + temp[2]
profile[x] = whatever 

OR

profile[x] = temp[1] + " " + temp[2]

NIL error in array

if temp[2] == nil 
 profile[x] = temp[1]
else 
 profile[x] = temp[1] + " " + temp[2]
end
Duck1337
  • 524
  • 4
  • 16
  • Are you referring to using that in concert with the spreadsheet gem or changing over the entire csv creation to the csv gem. I believe I understand how the while loop would work but How would I Implement that into the current code. Thanks for your time. – jcuwaz Dec 10 '13 at 18:53
  • Can you print out your data array? – Duck1337 Dec 11 '13 at 18:50
  • Duck can we revisit this I'm still working on this prog and I'm not sure you understood what I was looking to do. You are clearly one of the experts when it comes to the CSV gem and I think this would save me alot of time building other models if I was able to adjust my existing csv output to work with the site. Here is the link to the csv file that my current program outputs: https://drive.google.com/file/d/0B4VR1BUz6onVRnVfSURBTDZkMDA/edit?usp=sharing ; I need to manipulate it in a way that it will match the upload requirements of the e-commerce software. – jcuwaz Dec 20 '13 at 00:21
  • And here is an example of the proper format for uploading a csv file per the e-commerce software: https://drive.google.com/file/d/0B4VR1BUz6onVRmpNcjJCdTlMYkk/edit?usp=sharing ; hopefully you can help me bridge the gap using the CSV gem – jcuwaz Dec 20 '13 at 01:08
  • https://drive.google.com/file/d/0Bx7JdN9mMVdoX1NRbVdNRTBsQTA/edit?usp=sharing I NEVER do this. I posted code thats not done. It runs but the print out isnt complete. This is a learning community. Please go through the code and post your findings on here. You have to give back to the community. – Duck1337 Dec 20 '13 at 18:15
  • So very appreciated; the listing of profiles at the top and your comment thread really helped me understand; also the .split on the additional images was helpful. A few questions when you say changing the file format to .CSV ; I did this and used the windows compatible version to save but how would I know for sure that the formatting issues are resolved.I'm currently getting: in `=~': invalid byte sequence in UTF-8 (ArgumentError) – jcuwaz Dec 20 '13 at 19:25
  • https://drive.google.com/file/d/0Bx7JdN9mMVdoWjl5VFRWYXA3aFE/edit?usp=sharing there it is. I'm not 100% sure how to do it in windows. I use a debian machine when i work. Try using note++. I used "OpenOffice Calc" and just "saved-as" to CSV again. regardless, i uploaded the file – Duck1337 Dec 20 '13 at 20:30
  • Figured this one out, needed to use an encoding option like so: http://stackoverflow.com/questions/18307686/file-readlines-invalid-byte-sequence-in-utf-8-argumenterror ; thanks again, you are a gentleman and a scholar. – jcuwaz Dec 20 '13 at 21:47
  • One last question and I'll wish you a merry whatever you celebrate: I have two rows with similar data (short description and description); they are temp[2] and temp[3] in the csv doc. I would like to join them into one temp entry is that easily done in this format or should I try to do that when the program is originally gathering them? – jcuwaz Dec 20 '13 at 22:15
  • Is that what you were looking for? profile[x] = temp[2] + " " + temp[3] – Duck1337 Dec 20 '13 at 22:38
  • that is what I was thinking but I get a: undefined method `+' for nil:NilClass (NoMethodError) – jcuwaz Dec 20 '13 at 23:49