Processing lines of file in Ruby

Question

I have some file like this

 file alldataset; append next;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;

and I am trying to write a ruby program to push any line that comes after a semi colon to a new line. In addition, if a line has a 'do', indent from the 'do' so that the following line is indented by two blanks and any inner 'do' be indented by 4 blanks and so on.

I am very new to Ruby and my code so far is quite away from what I want. This is what I have

 def indent(text, num)
   " "*num+" " + text
 end

 doc = File.open('newtext.txt')
 doc.to_a.each do |line|
 if line.downcase =~ /^(file).+(;)/i
   puts line+"\n"
 end
 if line.downcase.include?('do')
  puts indent(line, 2)
 end
end

This is the desired output

file alldataset;
  append next;
  if file.first? do 
    line + "\n";
    if !file.last? do
      line.indent(2);
    end;
  end;

Any help would be appreciated.

Is that first chunk Ruby? If so, what's with all the `;` characters? — tadman, Oct 22 '17 at 19:23
To be sure we understand correctly which result you expect, please add the desired output after transforming the given input. — BernardK, Oct 22 '17 at 19:23
There's also some bitter irony here in that code that's supposed to indent things properly is not indented properly. — tadman, Oct 22 '17 at 19:24
Can I say that if a line has more than one semicolon, it has to be split at the semicolon(s), independently of its content ? What is the indent then ? — BernardK, Oct 22 '17 at 19:52
@BernardK, semi-colon is the natural end of each line. Each line after a semi-colon should be indented by 2 blanks except the line starting with "file". In addition, if a line is inside a 'do' then it takes additional indents to make 4 blanks and the deeper the 'do', the more indents. — John Doe, Oct 22 '17 at 19:59
@PJProudhon, your mention of parser actually caught my interest and I want to learn more. All of my search on google so far has produced results on how to parse html and extract texts. Do you have any material or website that I could use to understand how parser can be used for this kind of problem? — John Doe, Oct 23 '17 at 06:20
@JohnDoe, I'm actually only learning. But you will find some very valuable piece of information in here, on SO. — PJProudhon, Oct 23 '17 at 06:23
@PJProudhon I have added another answer with an ANTLR grammar. — BernardK, Oct 23 '17 at 10:28

score 1 · Answer 1 · answered Oct 23 '17 at 09:12

As you are interested in parsing, here is a quickly made example, just to give you a taste. I have learned Lex/Yacc, Flex/Bison, ANTLR v3 and ANTLR v4. I strongly recommend ANTLR4 which is so powerful. References :

The following grammar can parse only the input example you have provided.

File Question.g4 :

grammar Question;

/* Simple grammar example to parse the following code :

    file alldataset; append next; xyz;
    if file.first? do line + "\n";
    if !file.last? do line.indent(2);
    end;
    end;
    file file2; xyz;
*/

start
@init {System.out.println("Question last update 1048");}
    :   file* EOF
    ;

file
    :   FILE ID ';' statement_p*
    ;

statement_p
    :   statement
        {System.out.println("Statement found : " + $statement.text);}
    ;

statement
    :   'append' ID ';'
    |   if_statement
    |   other_statement
    |   'end' ';'
    ;

if_statement
    :   'if' expression 'do' expression ';'
    ;

other_statement
    :   ID ';'
    ;

expression
    :   receiver=( ID | FILE ) '.' method_call # Send
    |   expression '+' expression   # Addition
    |   '!' expression              # Negation
    |   atom                        # An_atom
    ;

method_call
    :   method_name=ID arguments?
    ;

arguments
    :   '(' ( argument ( ',' argument )* )? ')'
    ;

argument
    :   ID | NUMBER
    ;

atom
    :   ID
    |   FILE
    |   STRING
    ;

FILE   : 'file' ;
ID     : LETTER ( LETTER | DIGIT | '_' )* ( '?' | '!' )? ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
STRING : '"' .*? '"' ;

NL  : ( [\r\n] | '\r\n' ) -> skip ;

WS  : [ \t]+ -> channel(HIDDEN) ;

fragment DIGIT  : [0-9] ;
fragment LETTER : [a-zA-Z] ;

File input.txt :

 file alldataset; append next; xyz;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;
 file file2; xyz;

Execution :

$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question start -tokens -diagnostics input.txt 
[@0,0:0=' ',<WS>,channel=1,1:0]
[@1,1:4='file',<'file'>,1:1]
[@2,5:5=' ',<WS>,channel=1,1:5]
[@3,6:15='alldataset',<ID>,1:6]
[@4,16:16=';',<';'>,1:16]
[@5,17:17=' ',<WS>,channel=1,1:17]
[@6,18:23='append',<'append'>,1:18]
[@7,24:24=' ',<WS>,channel=1,1:24]
[@8,25:28='next',<ID>,1:25]
[@9,29:29=';',<';'>,1:29]
[@10,30:30=' ',<WS>,channel=1,1:30]
[@11,31:33='xyz',<ID>,1:31]
[@12,34:34=';',<';'>,1:34]
[@13,36:36=' ',<WS>,channel=1,2:0]
[@14,37:38='if',<'if'>,2:1]
[@15,39:39=' ',<WS>,channel=1,2:3]
[@16,40:43='file',<'file'>,2:4]
[@17,44:44='.',<'.'>,2:8]
[@18,45:50='first?',<ID>,2:9]
[@19,51:51=' ',<WS>,channel=1,2:15]
[@20,52:53='do',<'do'>,2:16]
[@21,54:54=' ',<WS>,channel=1,2:18]
[@22,55:58='line',<ID>,2:19]
[@23,59:59=' ',<WS>,channel=1,2:23]
[@24,60:60='+',<'+'>,2:24]
[@25,61:61=' ',<WS>,channel=1,2:25]
[@26,62:65='"\n"',<STRING>,2:26]
[@27,66:66=';',<';'>,2:30]
...
[@59,133:132='<EOF>',<EOF>,7:0]
Question last update 1048
Statement found : append next;
Statement found : xyz;
Statement found : if file.first? do line + "\n";
Statement found : if !file.last? do line.indent(2);
Statement found : end;
Statement found : end;
Statement found : xyz;

One advantage of ANTLR4 over previous versions or other parser generators is that the code is no longer scattered among the parser rules, but gathered in a separate listener. This is where you do the actual processing, such as producing a new reformatted file. It would be too long to show a complete example. Today you can write the listener in C++, C#, Python and others. As I don't know Java, I have a machinery using Jruby, see my forum answer.

Thanks a lot for this @BernardK, this is very helpful. I will learn more about this. Cheers! — John Doe, Oct 23 '17 at 09:23
@BernardK, many thanks for this addition. This is exactly what I need to learn and your answer will definitely be useful to me. — PJProudhon, Oct 23 '17 at 10:36
@PJProudhon [Here a complete Java example](https://stackoverflow.com/questions/46872931/can-an-element-contain-attribute-as-parsed-by-parser-generated-by-antlr-if-so/46916429#46916429) with `1.` a small grammar, `2.` a listener `3.` a program to run the parser and walk the tree, calling events in the listener. — BernardK, Oct 26 '17 at 08:20
@BernardK, your example looks like a good base for me to start. Thanks again. — PJProudhon, Oct 26 '17 at 10:39

score 0 · Accepted Answer · answered Oct 22 '17 at 22:16

In Ruby there are many ways to do things. So my solution is one among others.

File t.rb :

def print_indented(p_file, p_indent, p_text)
    p_file.print p_indent
    p_file.puts  p_text
end

    # recursively split the line at semicolon, as long as the rest is not empty
def partition_on_semicolon(p_line, p_answer, p_level)
    puts "in partition_on_semicolon for level #{p_level} p_line=#{p_line} / p_answer=#{p_answer}"
    first_segment, semi, rest = p_line.partition(';')
    p_answer << first_segment + semi
    partition_on_semicolon(rest.lstrip, p_answer, p_level + 1) unless rest.empty?
end

lines = IO.readlines('input.txt')

# Compute initial indentation, the indentation of the first line.
# This is to preserve the spaces which are in the input.
m = lines.first.match(/^( *)(.*)/)
initial_indent = ' ' * m[1].length
# initial_indent = '' # uncomment if the initial indentation needs not to be preserved
puts "initial_indent=<#{initial_indent}> length=#{initial_indent.length}"
level       = 1
indentation = '  '

File.open('newtext.txt', 'w') do | output_file |
    lines.each do | line |
        line        = line.chomp
        line        = line.lstrip # remove trailing spaces
        puts "---<#{line}>"
        next_indent = initial_indent + indentation * (level - 1)

        case
        when line =~ /^file/ && line.count(';') > 1
            level = 1 # restore, remove this if files can be indented
            next_indent = initial_indent + indentation * (level - 1)
            # split in count fragments
            fragments = []
            partition_on_semicolon(line, fragments, 1)
            puts '---fragments :'
            puts fragments.join('/')
            print_indented(output_file, next_indent, fragments.first)

            fragments[1..-1].each do | fragment |
                print_indented(output_file, next_indent + indentation, fragment)
            end

            level += 1
        when line.include?(' do ')
            fragment1, _fdo, fragment2 = line.partition(' do ')
            print_indented(output_file, next_indent, "#{fragment1} do")
            print_indented(output_file, next_indent + indentation, fragment2)
            level += 1
        else
            level -= 1 if line =~ /end;/
            print_indented(output_file, next_indent, line)
        end
    end
end

File input.txt :

 file alldataset; append next; xyz;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;
 file file2; xyz;

Execution :

$ ruby -w t.rb 
initial_indent=< > length=1
---<file alldataset; append next; xyz;>
in partition_on_semicolon for level 1 p_line=file alldataset; append next; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=append next; xyz; / p_answer=["file alldataset;"]
in partition_on_semicolon for level 3 p_line=xyz; / p_answer=["file alldataset;", "append next;"]
---fragments :
file alldataset;/append next;/xyz;
---<if file.first? do line + "\n";>
---<if !file.last? do line.indent(2);>
---<end;>
---<end;>
---<file file2; xyz;>
in partition_on_semicolon for level 1 p_line=file file2; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=xyz; / p_answer=["file file2;"]
---fragments :
file file2;/xyz;
---<>

Output file newtext.txt :

 file alldataset;
   append next;
   xyz;
   if file.first? do
     line + "\n";
     if !file.last? do
       line.indent(2);
       end;
     end;
 file file2;
   xyz;

Processing lines of file in Ruby

2 Answers2