Is there a way for Chef to become aware of an archive file's contents during a run?

Question

I have a chef recipe which clones a specific branch of a git repository that contains two .tgz files and an .sql file. The file names in the repo follow a convention, but are timestamped, which means there's no way to be sure of their exact names with each run. After cloning the repository, I'd like chef to extract both of the .tgz files.

I've gotten everything to work up until the part where chef needs to extract the .tgz files. The client run always errors out with the tgz filenames as nil. I believe the problem is that because of the way chef works, it may not be possible for chef to "discover" a file name that's been added to a directory during its run phase.

During my testing I found that if I clone the git repository before the chef run so that its contents are stored inside of the recipe's files/ directory, those files are included in chef's cache and are extracted as expected. I believe this works because the .tgz files are known to chef at this point; they aren't being made available during the run. This is a solution I can consider as a last resort, but it's not ideal as I'd like to do as little work on the end user's local machine as possible.

I'd like to know if my understanding is correct and if there's a way to achieve what I've outlined. Here's my code:

# Clone the repository
execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
    command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
    cwd web_root
end

# I need all three files eventually, so find their paths in the directory 
# they were cloned to and store them in a hash
backup_files = Hash.new
["code", "media", "db"].each do |type|
    backup_files[type.to_sym] = Dir["#{backup_holding_area}/*"].find{ |file| file.include?(type) }
end

# I need to use all three files eventually, but only code and media are .tgz files
# This nil check is where chef fails
unless backup_files[:code].nil? || backup_files[:media].nil? || backup_files[:db].nil?
    backup_files.slice(:code, :media).each do |key, file|
        archive_file "Restore the backup from #{file}" do
            path file
            destination web_root
            owner user
            group group
            overwrite :auto
            only_if { ::File.exist?(file) }
        end
    end
end

score 1 · Accepted Answer · answered Nov 18 '20 at 18:58

There are different phases of chef-client run. The "Compile" and "Converge" phase are the relevant ones in this situation. During the run, the "compile" phase comes first, then "converge".

Compile phase: "code" that is not within a Chef resource
Converge phase: "code" that is within Chef resources

For e.g., the below variable assignment will run during compile phase.

backup_files = Hash.new

Whereas the execute block (like below) will be run during converge:

execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
    command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
    cwd web_root
end

As all of the variable assignments are outside the resource blocks, they have been assigned long before the actual convergence. i.e. when files were not even in the destination directory. So they don't have the filenames as we are expecting.

One way to ensure that we get the filenames is to assign the variables inside a Chef resource. One such resource is the ruby_block resource.

Using this then we can have recipe like below:

# use execute to clone or use the git resource with properties as required
git backup_holding_area do
  repository backup_repository_url
  revision backup_version
  action :checkout
end

# Iterating over files in directory is still ok as there only 3 files
ruby_block 'get and extract code and media tar files' do
  block do
    Dir.entries("#{backup_holding_area}").each do |file|
      if file.include?('tar.gz')
        # appropriate flags can be used for "tar" command as per requirement
        system("tar xzf #{backup_holding_area}/#{file} -C #{web_root}")
      end
    end
  end
end

Thanks very much for this excellent answer. It absolutely answers the question I've asked. Please allow me to ask a follow up: Suppose the git repository was actually a zip file. (So, a zip file containing two tar files and an sql file). Would a ruby_block still work here? Meaning, we unzip the original zip into the holding area and then need to get the updated list of files that result and unzip some of those. — Steve K, Nov 18 '20 at 21:45
Yes, so if the zip file has a predictable name, you can use an `archive_file` resource prior to the `ruby_block`. Otherwise, you can have another `ruby_block` just to `unzip` the contents before the `ruby_block` for `tar`. — seshadri_c, Nov 19 '20 at 02:50
Yep, this worked well. The only frustrating thing about using a ruby_block is that I actually have a collection of custom resources that do things like unzip the tar files (using archive_file) and restore the database using an execute resource that calls mysqldump, etc. As far as I understand, using a ruby block like this essentially means I can't use any native chef resource nor any custom resource. Is that correct? In any case, thank you again for this answer and your excellent explanation. — Steve K, Nov 19 '20 at 03:52
Yes we can only use Ruby code inside `ruby_block`, *not* native Chef resources.If it helps, you can consider converting these 2-3 activites into a custom resource as well. — seshadri_c, Nov 19 '20 at 04:15
Thanks for this. After some proper testing, I've gotten it to work using my custom resources. The real take-away for me here was the fact that we need to wrap these things inside of resources so they're taken into account during the converge phase. Extremely helpful. — Steve K, Nov 19 '20 at 04:46

Is there a way for Chef to become aware of an archive file's contents during a run?

1 Answers1