1

I have a a name.txt file and last.txt file. I want to generate combination of all possible first and last names. For example:

$cat name.txt
Jack
Jamie
James
Jarred
Josh
John
Jane 


$cat last.txt
doe
smith

I tried doing this by:

File.open("name.txt", "r") do |n|


 File.open("last.txt", "r") do |l|
    n.each_line do |first|
       l.each_line do |last|
          full_name = first.chomp + " " + last.chomp
          puts full_name
      end
    end
  end
end

The output is only showing that it only processes the first line of the name file:

Jack doe 
Jack smith

How can I have it go through the entire first file providing full names for all the names in name.txt

user3610137
  • 283
  • 1
  • 5
  • 14

2 Answers2

3

Consider this:

first = %w[jane john]
last = %w[doe smith]

first.product(last)
# => [["jane", "doe"], ["jane", "smith"], ["john", "doe"], ["john", "smith"]]

You can do something like this:

first = File.readlines('name.txt').map(&:rstrip)
last = File.readlines('last.txt').map(&:rstrip)
first.product(last)

product is one of Array's methods. Also look at permutation and combination.

We can use chomp instead of rstrip to remove a trailing new-line, which will be returned by readlines, however chomp only trims new-lines, whereas rstrip will remove trailing whitespace, cleaning up the names a bit if there is any trailing whitespace. (In my experience it's more likely we'll see whitespace after text than before it, because it's easier to see when it's leading.)


Benchmarks:

require 'fruity'

FIRST_NAME = [*'a'..'z']
LAST_NAME  = [*'a'..'z']

FIRST_NAME.size # => 26
LAST_NAME.size  # => 26

def use_product
  FIRST_NAME.product(LAST_NAME) 
end

def use_loops
  output = []
  FIRST_NAME.each do |fn|
    LAST_NAME.each do |ln|
      output << [fn, ln]
    end
  end
  output
end

result = use_product
result.size  # => 676
result.first # => ["a", "a"]
result.last  # => ["z", "z"]

result = use_loops
result.size  # => 676
result.first # => ["a", "a"]
result.last  # => ["z", "z"]

Running it results in:

compare :use_product, :use_loops
# >> Running each test 64 times. Test will take about 1 second.
# >> use_product is faster than use_loops by 50.0% ± 10.0%

If the source arrays increase in size:

require 'fruity'

FIRST_NAME = [*'a1'..'z9']
LAST_NAME  = [*'a1'..'z9']

FIRST_NAME.size # => 259
LAST_NAME.size  # => 259

def use_product
  FIRST_NAME.product(LAST_NAME) 
end

def use_loops
  output = []
  FIRST_NAME.each do |fn|
    LAST_NAME.each do |ln|
      output << [fn, ln]
    end
  end
  output
end

result = use_product
result.size  # => 67081
result.first # => ["a1", "a1"]
result.last  # => ["z9", "z9"]

result = use_loops
result.size  # => 67081
result.first # => ["a1", "a1"]
result.last  # => ["z9", "z9"]

Running that returns:

compare :use_product, :use_loops
# >> Running each test once. Test will take about 1 second.
# >> use_product is faster than use_loops by 60.00000000000001% ± 10.0%

While we can write the algorithm without taking advantage of the built-in methods, the methods are written in C so take advantage of them to gain their added speed.

There is a time I'd use iteration of separate arrays over the built-in product: If I had two huge lists, and pulling them into memory was prohibitive because of RAM constraints causing scalability issues, then the only way to deal with it would be nested loops. Ruby's foreach is extremely fast, so writing code around it would be a good alternate:

File.foreach('name.txt') do |first|
  File.foreach('last.txt') do |last|
    full_name = first.chomp + " " + last.chomp
    puts full_name
  end
end
Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
2

To get each lines of the text file, you have to use each like this:

File.open("name.txt", "r").each do |n|
 . . . 

end

So, using each your code works:

File.open("name.txt", "r").each do |n|
 File.open("last.txt", "r").each do |l|
    n.each_line do |first|
       l.each_line do |last|
          full_name = first.chomp + " " + last.chomp
          puts full_name
      end
    end
  end
end

Although this works and solves your problem, but it's not an efficient way of reading files.

To make it efficient, you should use readlines to read the entire file content at a time and save that in an array. See this answer for more details on this.

So, your code can be more efficient if written this way:

names = File.readlines('name.txt')
last_names = File.readlines('last.txt')

names.each do |n|
 last_names.each do |l|
    n.each_line do |first|
       l.each_line do |last|
          full_name = first.chomp + " " + last.chomp
          puts full_name
      end
    end
  end
end
Community
  • 1
  • 1
K M Rakibul Islam
  • 33,760
  • 12
  • 89
  • 110
  • That way, `last.txt` is opened and read numer-of-lines-in-`name.txt`-times, which is quite inefficient. Instead, reading the file contents into a variable and looping over it will be *much* more efficient, esp. when the files are going to become larger. – Holger Just Nov 10 '15 at 22:59
  • I agree. I did not go for the efficient code. I just updated his current code to make that work, in this particular case, I just added `.each` and his own code worked with this little change :) – K M Rakibul Islam Nov 10 '15 at 23:01
  • `foreach` is cleaner than `open("last.txt", "r").each`. – the Tin Man Nov 10 '15 at 23:05
  • @HolgerJust I have updated my answer. I was in a rush when I answered the question and did not have the time to work on the efficiency at that time. Thanks for your feedback :) – K M Rakibul Islam Nov 11 '15 at 02:44
  • The inner two loops are not needed at all as `n` and `l` each contain a single-line string. The inner two loops are always just a single iteration long and are thus slow equivalents of `first = n` and `last = l`. In any case, the answer by @theTinMan is much more efficient that the nested loops in the first place. – Holger Just Nov 11 '15 at 09:28
  • This method will never be very efficient if either file is of a reasonable size. This is very likely a homework question as there's no practical use for the output in real life, however being able to do this with other types of data can be useful, but iterating over two lists, especially having to iterate over the inner list repeatedly, is slow and will grow extremely slow if the lists are hundreds of elements long. I smell a benchmark coming on. – the Tin Man Nov 11 '15 at 16:32
  • @theTinMan totally agree with you. – K M Rakibul Islam Nov 11 '15 at 16:37