-1

I would like to match all lines (including the first line) between two lines that start with 'SLX-', convert them to a comma separated line and then append them to a text file.

A truncated version of the original text file looks like:

SLX-9397._TC038IV_L_FLD0214.Read1.fq.gz
Sequences: 1406295
With index: 1300537
Sufficient length: 1300501
Min index: 0
Max index: 115
0       1299240
1       71
2       1
4       1
Unique: 86490
# reads processed: 86490
# reads with at least one reported alignment: 27433 (31.72%)
# reads that failed to align: 58544 (67.69%)
# reads with alignments suppressed due to -m: 513 (0.59%)
Reported 27433 alignments to 1 output stream(s)
SLX-9397._TC044II_D_FLD0197.Read1.fq.gz
Sequences: 308905
With index: 284599
Sufficient length: 284589
Min index: 0
Max index: 114
0       284290
1       16
Unique: 32715
# reads processed: 32715
# reads with at least one reported alignment: 13114 (40.09%)
# reads that failed to align: 19327 (59.08%)
# reads with alignments suppressed due to -m: 274 (0.84%)
Reported 13114 alignments to 1 output stream(s)
SLX-9397._TC047II_D_FLD0220.Read1.fq.gz

I imagine the ruby would look like

  1. Convert all /n between two lines with SLX- to commas
  2. Save the original text file as a new text file (or even better a CSV file.

I think I specifically have a problem with how to find and replace between two specific lines.

I guess I could do this without using ruby, but seeing as I'm trying to get into Ruby...

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • I basically want a comma instead of a newline for every line between the lines that start with the characters 'SLX'. This should include the first line with SLX but not the last one – Sebastian Zeki Aug 07 '15 at 06:47
  • [Read the file line by line](http://stackoverflow.com/questions/6012930/read-lines-of-a-file-in-ruby) and if a line starts with `SLX-` start building a new comma-separated string. Once the line starts with `SLX-`, stop building the string, [save it into the new file](http://stackoverflow.com/questions/2777802/how-to-write-to-file-in-ruby), and start over building the output string. – Wiktor Stribiżew Aug 07 '15 at 06:55
  • OK, but the bit that I can't do is 'if a line starts with SLX- start building a new comma-separated string. Once the line starts with SLX-, stop building the string'. – Sebastian Zeki Aug 07 '15 at 06:57

2 Answers2

1

Assuming, that you have your string in str:

require 'csv'
CSV.open("/tmp/file.csv", "wb") do |csv|
  str.scan(/^(SLX-.*?)(?=\R+SLX-)/m).map do |s| # break by SLX-
    s.first.split($/).map do |el|               # split by CR
      "'#{el}'"                                 # quote values
    end                           
  end.each do |line|                            # iterate
    csv << line                                 # fulfil csv
  end
end
Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
0

I don't know much about Ruby but this should work. You should read the entire file into a Sting. Use this regex - (\RSLX-) - to match all SLX- (all but the first one) and replace it with ,SLX-. For the explanation of the regex, go to https://regex101.com/r/pP3pP3/1

This question - Ruby replace string with captured regex pattern - might help you to understand how to replace in ruby

Community
  • 1
  • 1
Harsh Poddar
  • 2,394
  • 18
  • 17