0

I got 2 files: one has an index of 100 values the other one contains a lot of information I would like to extract information only from my index file. For example:

File1.txt

-name1

-name2

-name3

File2.txt:

Read id: name1

sometext  
sometext  
Complete

Read id: name8 (not index)

sometext  
sometext  
Complete

Read id: name2

sometext  
sometext  
Complete

So i would like to have as a print an output like this

Result:

Read id: name1

sometext  
sometext  
Complete

Read id: name2

sometext  
sometext  
Complete

So my code was:

f=open("file1.txt").readlines()

v=[]

for line in f v.push(line[0..-2]) end

reg = Regexp.new(v.join(""))

printing = false

File.open("file2.txt").each_line do |line|
printing > = true if line =~ /reg/

puts line if printing

printing = false if line =~ /Complete/

end

But the each_line do can't read my /reg/.. But if i insert /name1/ instead i got the output that i would like to have.. What can i do? Thank you for help

RandomX27
  • 39
  • 8

1 Answers1

0

I think your main issues are that you are including the dash from the first file and that you are joining with an empty string instead of the regex | and not removing the empty strings. You are also only matching against the literal regex /reg/, not the variable reg.

For minimal changes to your code, you can get it work by using:

f = open("file1.txt").readlines()

v=[]

for line in f
  v.push(line[1..-2]) # changed this line
end

reg = Regexp.union(v.reject(&:empty?)) # changed this line

printing = false

File.open("file2.txt").each_line do |line|
  printing = true if line =~ reg # changed this line

  puts line if printing

  printing = false if line =~ /Complete/
end

You can also do this with several other, shorter and cleaner approaches such as the following:

v = File.open("file1.txt").each_line.with_object([]) do |line, v|
  line = line[/-(\w+)/, 1]

  v << line if line
end

File.open("file2.txt").each_line do |line|
  if v.include?(line[/Read id: (\w+)/, 1])..(line.match?(/Complete/))
    puts line
  end
end

This way makes use of the string method #[regexp, capture] to pull just the parts of the string we're interested in out of the lines. And also uses the obscure flip-flop operator which will always evaluate to false until it matches the first condition, then it always evaluates to true until it matches the second condition, when it starts evaluating to false again (until it matches the first condition again). I've also switched from using a regex to match the line to output to just checking if the Read id: value is included in the array. With 100 values you want to check, that's a mighty long regex, which I just try and shy away from super long ones. This might be more performant or the regex might be, you can compare them against your actual use case (for this small sample you've given us, there isn't going to be any meaningful difference)

Simple Lime
  • 10,790
  • 2
  • 17
  • 32