7

Starting with ActiveStorage you can know define mirrors for storing your files.

local:
  service: Disk
  root: <%= Rails.root.join("storage") %>

amazon:
  service: S3
  access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
  secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
  region: us-east-1
  bucket: mybucket

mirror:
  service: Mirror
  primary: local
  mirrors:
    - amazon
    - another_mirror

If you add a mirror after a certain point of time you have to take care about copying all files e.g. from "local" to "amazon" or "another_mirror".

  1. Is there a convenient method to keep the files in sync?
  2. Or method run a validation to check if all files are avaiable on each service?
Chris
  • 318
  • 1
  • 15

4 Answers4

17

I have a couple of solutions that might work for you, one for Rails <= 6.0 and one for Rails >= 6.1:

Firstly, you need to iterate through your ActiveStorage blobs:

ActiveStorage::Blob.all.each do |blob|
  # work with blob
end

then...

  1. Rails <= 6.0

    You will need the blob's key, checksum, and the local file on disk.

    local_file = ActiveStorage::Blob.service.primary.path_for blob.key
    
    # I'm picking the first mirror as an example,
    # but you can select a specific mirror if you want
    mirror = blob.service.mirrors.first
    
    mirror.upload blob.key, File.open(local_file), checksum: blob.checksum
    

    You may also want to avoid uploading a file if it already exists on the mirror. You can do that by doing this:

    mirror = blob.service.mirrors.first
    
    # If the file doesn't exist on the mirror, upload it
    unless mirror.exist? blob.key
      # Upload file to mirror
    end
    

    Putting it together, a rake task might look like:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
    
      # Iterate through each blob
      ActiveStorage::Blob.all.each do |blob|
    
        # We assume the primary storage is local
        local_file = ActiveStorage::Blob.service.primary.path_for blob.key
    
        # Iterate through each mirror
        blob.service.mirrors.each do |mirror|
    
          # If the file doesn't exist on the mirror, upload it
          mirror.upload(blob.key, File.open(local_file), checksum: blob.checksum) unless mirror.exist? blob.key
    
          end
        end
      end
    end
    

    You may run into a situation like @Rystraum mentioned where you might need to mirror from somewhere other than the local disk. In this case, the rake task could look like this:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
    
        # All services in our rails configuration
        all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
    
        # Iterate through each blob
        ActiveStorage::Blob.all.each do |blob|
    
          # Select services where file exists
          services = all_services.select { |file| file.exist? blob.key }
    
          # Skip blob if file doesn't exist anywhere
          next unless services.present?
    
          # Select services where file doesn't exist
          mirrors = all_services - services
    
          # Open the local file (if one exists)
          local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
    
          # Upload local file to mirrors (if one exists)
          mirrors.each do |mirror|
            mirror.upload blob.key, local_file, checksum: blob.checksum
          end if local_file.present?
    
          # If no local file exists then download a remote file and upload it to the mirrors (thanks @Rystraum)
          services.first.open blob.key, checksum: blob.checksum do |temp_file|
            mirrors.each do |mirror|
              mirror.upload blob.key, temp_file, checksum: blob.checksum
            end
          end unless local_file.present?
    
        end
      end
    end
    

    While the first rake task answers the OP's question, the latter is much more versatile:

    • It can be used with any combination of services
    • A DiskService is not required
    • Uploading via DiskServices are prioritized
    • Avoids extra exists? calls as we only call it once per service per blob
  2. Rails > 6.1

    Its super easy, just call this on each blob...

    blob.mirror_later
    

    Wrapping it up as a rake task looks like:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
        ActiveStorage::Blob.all.each do |blob|
          blob.mirror_later
        end
      end
    end
    
Tayden
  • 276
  • 2
  • 8
  • 1
    Thanks, worked like a charm! Just don't forget to put `config.active_storage.service = :mirror` in `development.rb` or whatever env you want – nmondollot Feb 03 '21 at 09:01
  • Thank you for the solution, I just want to elaborate on it as it was not clear to me if the 6.1 solution (the point 2) actually copy the file to mirror: yes it does. It does it by (eventually) calling this class https://github.com/rails/rails/blob/f4229a2bf0e4425ced44db030141ec7de18621f2/activestorage/app/jobs/active_storage/mirror_job.rb#L13 which will eventually call https://github.com/rails/rails/blob/83217025a171593547d1268651b446d3533e2019/activestorage/lib/active_storage/service/mirror_service.rb#L53 – equivalent8 Jul 20 '21 at 20:23
  • Unfortunately this (6.1) does not work for me. Nothing happens, not even an error. :/ – Andre Zimpel Feb 17 '22 at 17:56
  • 1
    I also get no response because my service doesn't respond to :mirror - so rails skips the mirroring. https://apidock.com/rails/v6.1.3.1/ActiveStorage/Blob/mirror_later I can fix this by enqueuing the job directly ActiveStorage::MirrorJob.perform_later(blob.key,checksum:blob.checksum) – Confused Vorlon May 09 '22 at 10:44
  • 1
    `mirror_later` only works for Blobs which have `service_name` set to `mirror` (or whatever you called your mirror service in `storage.yml`). So, if all of your Blobs actually are stored on the primary storage of your mirror, you could update those Blobs to `service_name` = `mirror` and then call `mirror_later` on all of them. – olieidel Jul 09 '22 at 13:47
3

(03-11-2021) On Rails > 6.1.4.1, using active_storage > 6.1.4.1 and within:

Gemfile:

gem 'azure-storage-blob', github: 'Azure/azure-storage-ruby'

config/environments/production.rb

 # Store uploaded files on the local file system (see config/storage.yml for options).
  config.active_storage.service = :mirror #:microsoft or #:amazon

config/storage.yml:

amazon:
  service: S3
  access_key_id: XXX
  secret_access_key: XXX
  region: XXX
  bucket: XXX

microsoft:
  service: AzureStorage
  storage_account_name: YYY
  storage_access_key: YYY
  container: YYY

mirror:
  service: Mirror
  primary: amazon
  mirrors: [ microsoft ]

This does NOT work:

ActiveStorage::Blob.all.each do |blob|
  blob.mirror_later
end && puts("Mirroring done!")

What DID work is:

ActiveStorage::Blob.all.each do |blob|
  ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")

Not sure why that is, maybe future versions of Rails support it, or it needs additional background job setup, or it would have happened eventually (which never happened for me).

TL;DR

If you need to do mirroring for your entire storage immediately, add this rake task and execute it on your given environment with bundle exec rails active_storage:mirror_all:

lib/tasks/active_storage.rake

namespace :active_storage do
  desc 'Ensures all files are mirrored'
  task mirror_all: [:environment] do
    ActiveStorage::Blob.all.each do |blob|
      ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
    end && puts("Mirroring done!")
  end
end

Optional:
Once you mirrored all the blobs, then you probably want to change all their service names if you want them to actually get served from the right storage:

namespace :active_storage do
  desc 'Change each blob service name to microsoft'
    task switch_to_microsoft: [:environment] do
      ActiveStorage::Blob.all.each do |blob|
        blob.service_name = 'microsoft'
        blob.save
    end && puts("All blobs will now be served from microsoft!")
  end
end

Finally, change: config.active_storage.service= in production.rb or make the primary mirror to be the one you want future uploads to go to.

Khalil Gharbaoui
  • 6,557
  • 2
  • 19
  • 26
2

I've worked on top of https://stackoverflow.com/a/57579839/365218 so the rake task does not assume that the file is in local.

I started with S3, and due to cost concerns, I've decided to move the files to disk and use S3 and Azure as mirrors instead.

So my situation is that for some files, my primary (disk) sometimes don't have the file and my complete repository is actually on my 1st mirror.

So, it's 2 things:

  1. Move files from S3 to disk
  2. Added a new mirror, and want to keep it up to date
namespace :active_storage do
  desc "Ensures all files are mirrored"
  task mirror_all: [:environment] do
    ActiveStorage::Blob.all.each do |blob|
      source_mirror = if blob.service.primary.exist? blob.key
                        blob.service.primary
                      else
                        blob.service.mirrors.find { |m| m.exist? blob.key }
                      end

      source_mirror.open(blob.key, checksum: blob.checksum) do |file|
        blob.service.primary.upload(blob.key, file, checksum: blob.checksum) unless blob.service.primary.exist? blob.key

        blob.service.mirrors.each do |mirror|
          next if mirror == source_mirror

          mirror.upload(blob.key, file, checksum: blob.checksum) unless mirror.exist? blob.key
        end
      end
    rescue StandardError
      puts blob.key.to_s
    end
  end
end
Rystraum
  • 1,985
  • 1
  • 20
  • 32
  • Nice, thanks Rystraum! I recently ran into a similar situation where I transferred an app to a new server and needed to sync files between the DiskService and Mirrors. I updated my answer based on some of your code. Much appreciated! – Tayden Mar 24 '20 at 22:32
1

Everything is stored according to ActiveStorage's keys, so as long as your bucket names and file names aren't changed in the transfer, you can just copy everything over to the new service. See this post for how to copy stuff over.

ryanhkerr
  • 71
  • 5