Create arrays of words at new line points from file

Question

I have a file that contains chunks of text. (see https://github.com/rochford77/hw2_rochf1rt/blob/master/ClassList).

I need each chunk of code, between the spaces, to be its own array, and an array of words not characters.

I can read the file line-by-line, as an array of words with

in_file_array = IO.readlines('filename.txt')

I have three options, none of which I can figure out. I only need each block of text in an array for a small period of time, so I can print some information to a new file.

Option 1 is to have the above code stop at a new line, give me the array, let me mess with it, then on the next loop resume after the line until the next one, refilling my array with new information.
I could also just make a bunch of arrays, one for each chunk between the lines, and give them all different names.
I could take them in as one huge array, and then cut them into several smaller arrays between the lines.

I cannot seem to get any of the above to work. Could someone please provide some help?

To clarify I need an array that contains something like:

array1 = [PH03, ----, fine1l, howar1s,...]
#do something to array
array1 = [MT03, ----, fine1l, clega1s....]

but reading from the file.

Welcome to Stack Overflow. When asking for help with a programming question, we expect you to show us what you've tried. Stack Overflow is about helping debug specific problems with code, not about writing code for you or about advising the ways to do it prior to you writing anything. If you haven't tried, please do so before asking. If you have tried, please show us what you wrote and explain why it doesn't do what you want. — the Tin Man, Oct 05 '15 at 17:49
Welcome to StackOverflow. Please see stackoverflow.com/help/how-to-ask and stackoverflow.com/help/mcve. Most immediately, we need you to post the code you've written and the results therefrom. — Prune, Oct 05 '15 at 17:50
Also, the use of `readlines` is not scalable. A large file will result in the entire file being pulled into memory which is very slow. Please show a small example of your input file in the question itself, rather than ask us to go to a separate site. If/when the link rots your question will be useless to future people searching for the answer to a similar question. — the Tin Man, Oct 05 '15 at 17:52
As a hint, look at [`File.foreach`](http://ruby-doc.org/core-2.2.3/IO.html#method-c-foreach) and pay close attention to the second parameter to the method, for the line separator, and imagine what'd happen if you used `"\n\n"` as a separator. — the Tin Man, Oct 05 '15 at 17:54
When you give an example, boil it down to the essentials. Also, assign all inputs to variables so that readers can reference those variables in answers and comments without having to define them. See my answer for an example. — Cary Swoveland, Oct 05 '15 at 19:17
Please see "[How to read a file by paragraphs or chunks into arrays](http://stackoverflow.com/q/32955842/128421)". — the Tin Man, Oct 05 '15 at 19:46

Cary Swoveland · Accepted Answer · 2015-10-05T23:02:47.440

0

Let's create a file with some data:

text = <<_
PH03
----
fine1l
howar1s

MT03
----
fine1l
clega1s
targa1d

PH05
----
howar1m

EN01
----
howar1c
fine1l
tai1db
_

FName = "my_file"

IO.write(FName, text)
  #=> 111

The most efficient way of constructing the desired array is to do it as the file is being read, rather than first reading the file into a string or array. For that, it's convenient to read the file with the form of IO:foreach that returns an enumerator:

IO.foreach(FName).with_object([[]]) {|w,a| w.strip! == '' ? a << [] : a[-1] << w}
  #=> [["PH03", "----", "fine1l", "howar1s"],
  #    ["MT03", "----", "fine1l", "clega1s", "targa1d"],
  #    ["PH05", "----", "howar1m"],
  #    ["EN01", "----", "howar1c", "fine1l", "tai1db"]]

Edit: @theTinMan's excellent suggestion could be implemented as follows:

IO.foreach(FName, $/+$/).map { |s| s.strip.lines.map(&:strip) }

Note that IO.foreach(FName, $/+$/) also returns an enumerator.

edited Oct 05 '15 at 23:02

answered Oct 05 '15 at 18:26

Cary Swoveland

106,649
6
63
100

While `$/+$/` is a nice (and difficult to quickly grok) way of defining two line-ends, using a chained `map` will negate the benefits of using `foreach` as it'll cause the file contents to be buffered before anything is returned. – the Tin Man Oct 05 '15 at 19:58
@theTinMan, in the latter solution, I don't understand why the file contents would be buffered. "Lines" (ending `"\n\n"`) are read one-by-one by `foreach` and passed to `map`, which converts the string to an element of the array it will return. Once the block variable `s` is assigned to the next line, wouldn't the previous line be available for garbage collection? – Cary Swoveland Oct 05 '15 at 20:42

score 0 · Answer 2 · edited May 23 '17 at 12:29

Ruby's IO class has several methods that allow us to define the line-ending character or string found in a file which is a sequence of characters that define the end of a string being returned.

Usually it's "\n" but "\n\n" will return the file in chunks, blocks, paragraphs, or whatever you want to call them.

See "How to read a file by paragraphs or chunks into arrays" for more information.

Create arrays of words at new line points from file

2 Answers2