5

I have been using SmarterCSV to convert bed format file to csv file and changing the column names.

Now I have collected several CSV files, and want to combine them into one big CSV file.

In test3.csv, there are three columns, chromosome, start_site and end_site that will be used, and the other three columns, binding_site_pattern,score and strand that will be removed.

By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor column: Cmyc, in the cell_type column: PWM, in the project_name column: JASPAR.

Anyone have any ideas on this one?

test1.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  

test2.csv

transcription_factor,cell_type,chromosome,start_site,end_site,project_name  
Cmyc,H1ESC,19,9710417,9710587,ENCODE  
Cmyc,H1ESC,11,541754,542137,ENCODE  

test3.csv

chromosome,start_site,end_site,binding_site_pattern,score,strand  
chr1,21942,21953,AAGCACGTGGT,1752,+    
chr1,21943,21954,AACCACGTGCT,1335,-  

Desired combined result:

transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE  
Cmyc,GM12878,11,6704236,6704683,ENCODE  
Cmyc,H1ESC,19,9710417,9710587,ENCODE    
Cmyc,H1ESC,11,541754,542137,ENCODE   
Cmyc,PWM,1,21942,21953,JASPAR  
Cmyc,PWM,1,21943,21954,JASPAR
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Michael
  • 65
  • 1
  • 7

1 Answers1

8
hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }

CSV.open('result.csv','w') do |csv|
  csv << hs
  CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
  CSV.foreach('test3.csv', headers: true) do |row|
    csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
  end
end
Jacob Brown
  • 7,221
  • 4
  • 30
  • 50
  • 1
    thanks it really helps, by the way what does * means? – Michael Jul 25 '14 at 02:41
  • @user3239006, it's the so-called ["splat" operator](http://endofline.wordpress.com/2011/01/21/the-strange-ruby-splat/), which does stuff to arrays. It is used here to "unpack" the `hs` array into a number of separate arguments for `values_at`. – Jacob Brown Jul 25 '14 at 10:55