0

Considering the following code:

    lines = Array.new() 
    File.foreach('file.csv').with_index do |line, line_num|                 
      lines.push(line.split(" ")) if line_num > 0                                 
    end                                                                                  

    indices = lines.map { |el| el.last }                                          
    duplicates = indices.select{ |e| indices.count(e) > 2 }.uniq

Example CSV file looks like this for all who wonder:

# Generated by tool XYZ
a b c 1
d e f 2
g h i 1
j k l 4
m n o 5
p q r 2
s t u 2
v w x 1
y z 0 5

Is it possible to chain these two methods blocks (last two lines of code) together?

stephanmg
  • 746
  • 6
  • 17
  • 2
    Seems like your file contains a header line and space separated values. It might be worth taking a look at Ruby's [CSV](http://ruby-doc.org/stdlib-2.6.3/libdoc/csv/rdoc/CSV.html) class. – Stefan May 22 '19 at 10:36
  • Yes, this might be a good idea as well. However my file consists out of one comment string ("# Generated by tool XYZ") in the first line (No real CSV header thus). Which I can simply skip with the iterator above. Correct? – stephanmg May 23 '19 at 08:11
  • 1
    Sure, although `File.foreach('file.csv').drop(1)` might be cleaner in that case. – Stefan May 23 '19 at 08:49
  • Stefan: Fabulous. Thank you. – stephanmg May 23 '19 at 08:51

3 Answers3

2

If you don't want to have a intermediate variable and want to do it in a single line, you can write something like this:

duplicates = lines.group_by(&:last).select{|k, v| v.count > 2}.keys

For some people, this might hinder readability though! Just depends on your taste.

Babar Al-Amin
  • 3,939
  • 1
  • 16
  • 20
  • How do you call the &:last syntax? It's not a method reference but a shorthand notation. Pretzel colon? https://stackoverflow.com/questions/1217088/what-does-mapname-mean-in-ruby – stephanmg May 23 '19 at 08:17
  • So, also this should be valid: duplicates = lines.group_by {|e| e.last}.select{|k, v| v.count > 2}.keys if understood the &:method syntax correctly which convert's basically a proc to a block. – stephanmg May 23 '19 at 08:27
1

O(N) solution (single pass) would look like:

lines.each_with_object([[], []]) do |el, (result, temp)|
  (temp.delete(el) ? result : temp) << el
end.first

Here we use an intermediate


Also, you always might use Object#tap:

duplicates =
  lines.map(&:last).tap do |indices|
    indices.select { |e| indices.count(e) > 2 }.uniq
  end
Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
0

Example

Let's apply your code to an example.

str =<<-END
Now is the
time for all
people who are
known to all
of us as the
best coders are
expected to
lead all
those who are
less experienced
to greatness
END

FName = 'temp'
File.write(FName, str)
  #=> 146

Your code

lines = Array.new() 
File.foreach(FName).with_index do |line, line_num|                 
  lines.push(line.split(" ")) if line_num > 0                                 
end                                                                                  
lines
  #=> [["time", "for", "all"], ["people", "who", "are"], ["known", "to", "all"],
  #    ["of", "us", "as", "the"], ["best", "coders", "are"], ["expected", "to"],
  #    ["lead", "all"], ["those", "who", "are"], ["less", "experienced"],
  #    ["to", "greatness"]] 
indices = lines.map { |el| el.last }                                          
  #=> ["all", "are", "all", "the", "are", "to", "all", "are", "experienced", "greatness"] 
duplicates = indices.select { |e| indices.count(e) > 2 }
  #=> ["all", "are", "all", "are", "all", "are"] 
duplicates.uniq
  #=> ["all", "are"] 

The object is seen to return an array of all words that appear as the last word of a line (other than the first line) more than twice.

More Ruby-like and more efficient code

We can do that more concisely and efficiently by making a single pass through the file:

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h.select { |_,count| count > 2 }.keys
  #=> ["all", "are"]

Steps performed

The steps are as follows.

first_line = true
h = Hash.new(0)
File.foreach(FName) do |line|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end
h #=> {"all"=>3, "are"=>3, "the"=>1, "to"=>1, "experienced"=>1, "greatness"=>1}
g = h.select { |_,count| count > 2 }
  #=> {"all"=>3, "are"=>3} 
g.keys
  #=> ["all", "are"]

Use of Enumerator#each_object

Rather than defining the hash before File.foreach(..) is executed, its customary to use the method Enumerator#each_object, which allows us to chain the hash that is constructed to that statements that follow:

first_line = true
File.foreach(FName).with_object(Hash.new(0)) do |line, h|
  if first_line
    first_line = false
  else
    h[line[/\S+(?=\n)/]] += 1
  end
end.select { |_,count| count > 2 }.keys
  #=> ["all", "are"] 

Use of a counting hash

I define the hash as follows.

h = Hash.new(0)

This uses the form of Hash::new that defines a default value equal to news argument. If h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero. Ruby's parser expands the expression h[k] += 1 to:

h[k] = h[k] + 1

If h does not have a key k, the expression becomes

h[k] = 0 + 1

Note that h[k] = h[k] + 1 is shorthand for:

h.[]=(k, h.[](k) + 1)

It is the method Hash#[] that defaults to zero, not the method Hash#[]=.

Using a regular expression to extract the last word of each line

One of the lines is

str = "known to all\n"

We can use the regular expression r = /\S+(?=\n)/ to extract the last word:

str[r] #=> "all"

The regular expression reads, "match one or more (+) characters that are not whitespace characters (\S), immediately followed by a newline character. (?=\n) is a positive lookahead. "\n" must be matched by it is not part of the match returned.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100