0

I've got this working, but it's very slow (to the point of timing out) for even a dozen or so files.

It grabs a dir listing from dropbox, and compares that to the contents of a table. I'd like to optimize this so that it would run as fast and efficiently as possible. I know querying every time is not optimal, but i think the major delay is during the Photo.create method since that is where it copies the file from the dropbox folder to Amazon S3 (via carrierwave gem). I am looking into putting a time on the operation to see where the delay is coming from. for a folder with 10 files, it takes well over a minute to load the page. The strange thing is that it takes this long even if it skips those files because they already exist, which makes no sense to me.

Here's my controller code:

def sync
    photo_size = 1024
    @event = Event.find(params[:id])

    @client = Dropbox::API::Client.new(:token  => 'derp', :secret => 'herp')
    @dropbox_files = @client.ls "images/#{@event.keyword}/#{photo_size}/"

    @existing_photos = @event.photos.all
    @data = []

    # TODO: need to make it not add files multiple times


    @dropbox_files.each do |f|


      photo_exists = Photo.where(:dropbox_path => f.direct_url.url).count
      if photo_exists == 0
        @photo = Photo.create(:remote_filename_url => f.direct_url.url, 
                              :dropbox_path => f.direct_url.url,
                              :event_id => @event.id)
        @data << "Added: #{f.direct_url.url.split('/').last}"
      else
        @data << "Skipped: #{f.direct_url.url.split('/').last}"
      end
    end
  end

Ideally, i'd like to separate each Photo.create call into an async request, but that might be a whole 'notha thing. For now, i'd be happy if it was something that could handle adding 5 photos out of a list of 100 without timing out.

what is the best way to do this? I'm a PHP programmer that's new to RoR3. Please help. thanks!

one note: for now, this outputs to a screen, but eventually it will be a background action.

afxjzs
  • 982
  • 3
  • 10
  • 19

2 Answers2

1

I have a few things you could try. I'm not familiar with the Dropbox API, but you should be able to figure this out:

Store the date of the last sync, and only retrieve files that are new or changed.

Extract your sync method into a new class- the controller probably isn't the best choice for this responsibility. Here's an example of how you could do that:

class EventSync
  attr_reader :event

  def initialize(event_or_id)
    @event = Event.find(event_or_id)
  end

  def sync
    dropbox_files.each do |f|
      process_file(f)
    end
  end

  private
    def photo_size
      1024
    end

    def process_file(file)
      event.photos.where(dropbox_path: file.direct_url.url).first_or_create do |file|
        file.remote_filename_url = file.direct_url.url
      end 
    end

    def client
      @client ||= Dropbox::API::Client.new(:token  => 'derp', :secret => 'herp')
    end

    def dropbox_files
      @dropbox_files ||= client.ls "images/#{event.keyword}/#{photo_size}/"
    end

end

This would be used like this: EventSync.new(params[:event_id]).sync.

By splitting this into many smaller methods, benchmarking will be easier (you can test each method individually), meaning you'll better be able to identify where the slowdown is.

Zach Kemp
  • 11,736
  • 1
  • 32
  • 46
  • whoa...ok. That's way different than what I was thinking. I haven't had a chance to try and implement this, so i'll just up vote for now. I have a much more PHP-based mindset, so it's nice to see the 'ruby way' to do things. Am I correct in thinking this class would just be located at /lib/eventsync.rb and would be included with `require 'eventsync'` ? – afxjzs Nov 26 '12 at 20:26
  • 1
    You could put it there, but a good rule of practice is to only put code in `lib` that you would re-use in other apps. This seems fairly specific to this use case, however, so I would probably just pop these into the `app` folder. Read Yehuda Katz's answer to [this quesiton](http://stackoverflow.com/questions/1068558/oo-design-in-rails-where-to-put-stuff) for more info. – Zach Kemp Nov 26 '12 at 21:36
  • pretty impressive. that worked almost straight away. one issue: using `file` for both the `event` and `photo` objects caused some confusion. thanks so much for your help...this is great. – afxjzs Nov 27 '12 at 20:28
0

This is the way I have it working now, before I try Zach's method.

In the Controller:

  def syncall
    #TODO: Refactor sync and syncall
    photo_size = 1024
    @event = Event.find(params[:id])

    new_image_dir = "images/#{@event.keyword}/#{photo_size}/"
    @client = Dropbox::API::Client.new(:token  => 'uuzpqar2m5839eo', :secret => 'nr9tmx0vc8qh892')
    @dropbox_files = @client.ls new_image_dir
    start = Time.now  

    existing_photos = @event.photos.all
    @data = []
    photo_list = []

    existing_photos.each do |ep|
      filename = URI.unescape(ep.dropbox_path.split('/').last) #dropbox_path is url encoded...
      photo_list << filename
    end
    @data << photo_list

    skipped_files = 0

    @dropbox_files.each do |f|
      sql_start = Time.now
      db_filename = f.path.split('/').last

      if photo_list.include? db_filename 
        skipped_files += 1
      else
        pc_start = Time.now
        if db_filename.split('.').last == 'jpg'
          db_path = f.direct_url.url
          @photo = Photo.create(:remote_filename_url => db_path, 
                                :dropbox_path => db_path,
                                :event_id => @event.id)
          @data << "#{db_filename} added in #{Time.now - pc_start} seconds"
        else
          @data << "#{db_filename} was skipped in #{Time.now - pc_start} seconds"
        end
      end
    end    
    @data << "Total Time: #{Time.now - start} (#{skipped_files} skipped.)"
  end

This way, if there are no files to add, there is only one query run. Another issue is that the direct_url.url call is pretty heavy, as it connects to dropbox each time it's called.

It went from about 2s to .01s per photo skipped and 5-7s to 2-4s per photo uploaded. I still like Zach's method better, so i am going to try that now.

afxjzs
  • 982
  • 3
  • 10
  • 19