-2

I have a file like this:

some content

some oterh

*********************

useful1 text

useful3 text

*********************
some other content

How do I get the content of the file within between two stars line in an array. For example, on processing the above file the content of array should be like this

a=["useful1 text" , "useful2 text"]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
user1788294
  • 1,823
  • 4
  • 24
  • 30

4 Answers4

2

A really hack solution is to split the lines on the stars, grab the middle part, and then split that, too:

content.split(/^\*+$/)[1].split(/\s+/).reject(&:empty?)
# => ["useful1","useful3"]
tadman
  • 208,517
  • 23
  • 234
  • 262
  • I am getting the content like this content=File.open("somefilename.txt").read . This is giving me error undefined method split. What is the correct way to get the content. – user1788294 Aug 28 '14 at 19:19
  • `File.read(...)` will do the job as well. If you're getting an undefined method, maybe you're not getting anything back for `content`? – tadman Aug 28 '14 at 19:22
  • I checked content is not empty . Seems like something missing here. – user1788294 Aug 28 '14 at 19:27
  • Test it in `irb`. You should get a string back. – tadman Aug 28 '14 at 19:27
  • Checked in irb some problem with the regular expression. Is the regular expression /^\*+$ correct. I am getting like this "sdfd jjkdf **** dfdf dfd ***** dsfdsf fdsfds".split(/^\*+$/) => ["sdfd jjkdf **** dfdf dfd ***** dsfdsf fdsfds"] – user1788294 Aug 28 '14 at 19:37
  • Your example has the `****` line all by itself, which is why the regular expression is composed that way. If you prefer, you can just remove the `^` and `$` anchors. – tadman Aug 28 '14 at 19:38
  • What if the content is like this some content some oterh ********************* useful1 text useful3 text ********************* some other content and I want the out put as => ["useful text" , "useful2 text"] – user1788294 Aug 28 '14 at 19:45
  • Just to mention your above suggestion to remove ^ and $ worked. Thanks for that – user1788294 Aug 28 '14 at 19:45
  • Why not making the second split with the new line separator: `a=File.read("test.txt").split(/^\*+$/)[1].split($/).reject(&:empty?)`? The solution with `split(/\s+/)` also splits the useful texts. – Roland Sep 10 '14 at 12:07
1
f = File.open('test_doc.txt', 'r')
content = [] 
f.each_line do |line|
  content << line.rstrip unless !!(line =~ /^\*(\*)*\*$/)
end
f.close

The regex pattern /^*(*)*$/ matches strings that contain only asterisks. !!(line =~ /^*(*)*$/) always returns a boolean value. So if the pattern does not match, the string is added to the array.

  • 1
    That will add all lines that doesn't match the star-pattern. Even the header and the trailer. This will probably not meet the OP:s expectations. – Magnus Bodin Sep 11 '14 at 08:35
  • Well..I think the question is quite vague and my answer was an attempt to point out to the OP, ways in which regex can be used to filter out useless patterns.. – Karthik Mallavarapu Sep 11 '14 at 20:37
0

What about this:

def values_between(array, separator)
  array.slice array.index(separator)+1..array.rindex(separator)-1
end

filepath  = '/tmp/test.txt'
lines     = %w(trash trash separator content content separator trash)
separator = "separator\n"

File.write '/tmp/test.txt', lines.join("\n")
values_between File.readlines('/tmp/test.txt'), "separator\n"
#=> ["content\n", "content\n"]
mdesantis
  • 8,257
  • 4
  • 31
  • 63
  • When I am using lines="trash trash separator content content separator trash" , why I am not getting the correct result. – user1788294 Aug 28 '14 at 19:40
  • 2
    @user1788294 Because you took his post completely literally instead of adapting it to what you actually need? – Dave Newton Aug 28 '14 at 19:53
0

I'd do it like this:

lines = []
File.foreach('./test.txt') do |li|
  lines << li if (li[/^\*{5}/] ... li[/^\*{5}/])
end

lines[1..-2].map(&:strip).select{ |l| l > '' }
# => ["useful1 text", "useful3 text"]

/^\*{5}/ means "A string that starts with and has at least five '*'.

... is one of two uses of .. and ... and, in this use, is commonly called a "flip-flop" operator. It isn't used often in Ruby because most people don't seem to understand it. It's sometimes mistaken for the Range delimiters .. and ....

In this use, Ruby watches for the first test, li[/^\*{5}/] to return true. Once it does, .. or ... will return true until the second condition returns true. In this case we're looking for the same delimiter, so the same test will work, li[/^\*{5}/], and is where the difference between the two versions, .. and ... come into play.

.. will return toggle back to false immediately, whereas ... will wait to look at the next line, which avoids the problem of the first seeing a delimiter and then the second seeing the same line and triggering.

That lets the test assign to lines, which, prior to the [1..-2].map(&:strip).select{ |l| l > '' } looks like:

# => ["*********************\n",
#     "\n",
#     "useful1 text\n",
#     "\n",
#     "useful3 text\n",
#     "\n",
#     "*********************\n"]

[1..-2].map(&:strip).select{ |l| l > '' } cleans that up by slicing the array to remove the first and last elements, strip removes leading and trailing whitespace, effectively getting rid of the trailing newlines and resulting in empty lines and strings containing the desired text. select{ |l| l > '' } picks up the lines that are greater than "empty" lines, i.e., are not empty.

See "When would a Ruby flip-flop be useful?" and its related questions, and "What is a flip-flop operator?" for more information and some background. (Perl programmers use .. and ... often, for just this purpose.)

One warning though: If the file has multiple blocks delimited this way, you'll get the contents of them all. The code I wrote doesn't know how to stop until the end-of-file is reached, so you'll have to figure out how to handle that situation if it could occur.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303