0

I'm trying to extract filenames from rar packages in a directory. I'm using 7z which returns a multi-line string, and would like to search the output for "mkv", "avi", or "srt" files.

Here's my code:

ROOT_DIR = "/users/ken/extract"

# Check each directory for Rar packages
# Returns an arary of directories with filenames from the rar's
def checkdirs()
    pkgdirs = {}
    Dir.foreach(ROOT_DIR) do |d|
        if !Dir.glob("#{ROOT_DIR}/#{d}/*.rar").empty?
            rarlist = `7z l #{ROOT_DIR}/#{d}/*.rar`
            puts rarlist  # Returns multilinen output from 7z l
            puts rarlist.scan('*.mkv').first
            pkgdirs[d] = 'filename'
        end
    end
    pkgdirs
end

I can get the 7z output but I can't figure out how to search the output for my strings. How can I search the output and return the matching lines?

This is an example of the 7z output:

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs x64)

Scanning the drive for archives:
1 file, 15000000 bytes (15 MiB)

Listing archive: Gotham.S03E19.HDTV.x264-KILLERS/gotham.s03e19.hdtv.x264-killers.rar

--
Path = Gotham.S03E19.HDTV.x264-KILLERS/gotham.s03e19.hdtv.x264-killers.rar
Type = Rar
Physical Size = 15000000
Total Physical Size = 285988640
Characteristics = Volume FirstVolume VolCRC
Solid = -
Blocks = 1
Multivolume = +
Volume Index = 0
Volumes = 20

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52 .....    285986500    285986500  Gotham.S03E19.HDTV.x264-KILLERS.mkv
------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52          285986500    285986500  1 files

------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52          285986500    285986500  1 files

Archives: 1
Volumes: 20
Total archives size: 285988640

I expect this output:

 2017-05-23 02:30:52 .....    285986500    285986500  Gotham.S03E19.HDTV.x264-KILLERS.mkv
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Ken J
  • 4,312
  • 12
  • 50
  • 86

2 Answers2

1

You can use this:

puts rarlist.scan(/^.*\.mkv/)

The regex will match from the beginning of lines.

To match .mkv, .avi, or .srt, you can use:

rarlist.scan(/(^.*\.(mkv|avi|srt))/) {|a,_| puts a}
Adrian
  • 425
  • 4
  • 13
0

The solution is much simpler than what you're making it.

Starting with:

TARGET_EXTENSIONS = %w[mkv avi srt]
TARGET_EXTENSION_RE = /\.(?:#{ Regexp.union(TARGET_EXTENSIONS).source})/
# => /\.(?:mkv|avi|srt)/

output = <<EOT
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs x64)

Scanning the drive for archives:
1 file, 15000000 bytes (15 MiB)

Listing archive: Gotham.S03E19.HDTV.x264-KILLERS/gotham.s03e19.hdtv.x264-killers.rar

--
Path = Gotham.S03E19.HDTV.x264-KILLERS/gotham.s03e19.hdtv.x264-killers.rar
Type = Rar
Physical Size = 15000000
Total Physical Size = 285988640
Characteristics = Volume FirstVolume VolCRC
Solid = -
Blocks = 1
Multivolume = +
Volume Index = 0
Volumes = 20

    Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52 .....    285986500    285986500  Gotham.S03E19.HDTV.x264-KILLERS.mkv
------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52          285986500    285986500  1 files

------------------- ----- ------------ ------------  ------------------------
2017-05-23 02:30:52          285986500    285986500  1 files

Archives: 1
Volumes: 20
Total archives size: 285988640
EOT

All it takes is to iterate over the lines in the output and puts the matches:

puts output.lines.grep(TARGET_EXTENSION_RE)

Which would output:

2017-05-23 02:30:52 .....    285986500    285986500  Gotham.S03E19.HDTV.x264-KILLERS.mkv

The above is a basic solution, but there are things that could be done to speed up the code, depending on the output being received:

TARGET_EXTENSIONS = %w[mkv avi srt].map { |e| '.' << e } # => [".mkv", ".avi", ".srt"]
puts output.split(/\r?\n/).select { |l| l.end_with?(*TARGET_EXTENSIONS) }

I'd have to run benchmarks, but that should be faster, since regular expressions can drastically slow code if not written correctly.

You could try:

TARGET_EXTENSION_RE = /\.(?:#{ Regexp.union(TARGET_EXTENSIONS).source})$/
# => /\.(?:mkv|avi|srt)$/
puts output.split(/\r?\n/).grep(TARGET_EXTENSION_RE)

as anchored patterns are much faster than unanchored.

If the 7z archives will generate huge listings (in the MB range) it'd be better to iterate over the input to avoid scalability issues. In the above example output.lines would be akin to slurping the output. See "Why is "slurping" a file not a good practice?" for more information.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303