0

I have a CSV file containing many rows and columns 2 of which are similar to:

 Horizontal-1 Acc. Filename      Horizontal-2 Acc. Filename
 RSN88_SFERN_FSD172.AT2          RSN88_SFERN_FSD262.AT2 
 RSN164_IMPVALL.H_H-CPE147.AT2   RSN164_IMPVALL.H_H-CPE237.AT2 
 RSN755_LOMAP_CYC195.AT2         RSN755_LOMAP_CYC285.AT2 
 RSN1083_NORTHR_GLE170.AT2       RSN1083_NORTHR_GLE260.AT2 
 RSN1614_DUZCE_1061-N.AT2        RSN1614_DUZCE_1061-E.AT2 
 RSN1633_MANJIL_ABBAR--L.AT2     RSN1633_MANJIL_ABBAR--T.AT2 
 RSN3750_CAPEMEND_LFS270.AT2     RSN3750_CAPEMEND_LFS360.AT2 
 RSN3757_LANDERS_NPF090.AT2      RSN3757_LANDERS_NPF180.AT2
 RSN3759_LANDERS_WWT180.AT2      RSN3759_LANDERS_WWT270.AT2 
 RSN4013_SANSIMEO_36258021.AT2   RSN4013_SANSIMEO_36258111.AT2 
 RSN4841_CHUETSU_65004NS.AT2     RSN4841_CHUETSU_65004EW.AT2 
 RSN4843_CHUETSU_65006NS.AT2     RSN4843_CHUETSU_65006EW.AT2 
 RSN4844_CHUETSU_65007NS.AT2     RSN4844_CHUETSU_65007EW.AT2 
 RSN4848_CHUETSU_65011NS.AT2     RSN4848_CHUETSU_65011EW.AT2 

In the CSV file I wanna look for the headers "Horizontal-1 Acc. Filename and Horizontal-2 Acc. Filename" and then line by line get the names of each row under these headers one at a time ?

Any suggestion ?

Thanks RG.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
Reza
  • 19
  • 1
  • 6
  • Do you want to write this in TCL - is that why you tagged it TCL? What have you tried so far? Once you have these rows what do you want to do with them? display them? do a calc? – Nick.Mc Aug 09 '16 at 05:33
  • Hi Nick Yes it is gonna be in tcl. – Reza Aug 09 '16 at 06:33

2 Answers2

1
package require csv
package require struct::matrix

::struct::matrix m
m add columns 2

set chan [open data.csv]
::csv::read2matrix $chan m
close $chan

lassign [m get row 0] header1 header2

for {set r 1} {$r < [m rows]} {incr r} {
    puts -nonewline [format {%s = %-30s  } $header1 [m get cell 0 $r]]
    puts [format {%s = %s} $header2 [m get cell 1 $r]]
}

m destroy

I find that the easiest way to deal with csv data sets is by using a matrix. A matrix is sort of a two-dimensional vector with built-ins for searching, sorting and rearranging columns and rows.

First, create a matrix and call it m. It will have two columns from the beginning, but no rows yet.

::struct::matrix m
m add columns 2

Open a channel to read the data file. Pass the channel and the matrix name to the ::csv::read2matrix command. This command will read the csv data and create a matrix row for each data row. The data fields are stored in the columns.

set chan [open data.csv]
::csv::read2matrix $chan m
close $chan

To get the header strings, retrieve row 0.

lassign [m get row 0] header1 header2

To iterate over the data rows, go from 1 (if we didn't have headers, 0) to just under m rows, which is the number of rows in the matrix.

There is a handy report facility that works well with matrices, but I'll just use a for loop here. I'm guessing how you want the data presented:

for {set r 1} {$r < [m rows]} {incr r} {
    puts -nonewline [format {%s = %-30s  } $header1 [m get cell 0 $r]]
    puts [format {%s = %s} $header2 [m get cell 1 $r]]
}

If you're done with the matrix, you might as well destroy it.

m destroy

Solution for the specific problem in the comments.

package require csv
package require struct::matrix

::struct::matrix m

set chan [open foo.csv]
::csv::read2matrix $chan m , auto
close $chan

set f1 [m search column 0 "Result ID"]
set headerRow [lindex $f1 0 1]
set f2 [m search rect 0 $headerRow 0 [expr {[m rows] - 1}] ""]
set f3 [m search row $headerRow "Horizontal-1 Acc. Filename"]
set f4 [m search row $headerRow "Horizontal-2 Acc. Filename"]

set top [expr {$headerRow + 1}]
set bottom [expr {[lindex $f2 0 1] - 1}]
set left [lindex $f3 0 0]
set right [lindex $f4 0 0]

puts [format {Vector=[ %s ]} [concat {*}[m get rect $left $top $right $bottom]]]
m destroy

Obviously, you need to change the filename to the correct name. There is no error handling: in such a simple script it's better to just have the script fail and correct whatever went wrong.


Solution to the second problem, comments below:

package require csv
package require struct::matrix

::struct::matrix m

set chan [open _SearchResults.csv]
::csv::read2matrix $chan m , auto
close $chan

set f1 [m search column 0 {Result ID}]
set headerRow [lindex $f1 0 1]

set f2 [m search -glob rect 0 $headerRow 0 [expr {[m rows] - 1}] { These*}]
set numofRow [lindex $f2 0 1]

set headercol1 [m search row $headerRow { Horizontal-1 Acc. Filename}]
set headercol2 [m search row $headerRow { Horizontal-2 Acc. Filename}]  

set indexheaderH1col [lindex $headercol1  0 0]
set indexheaderH2col [lindex $headercol2  0 0]

set rows [m get rect $indexheaderH1col [expr {$headerRow+1}] $indexheaderH2col [expr {$numofRow-1}]]

set rows [lmap row $rows {
    lassign $row a b
    list [string trim $a] [string trim $b]
}]

foreach row $rows {
    puts [format {%-30s   %s} {*}$row]
}

puts [format {Vector=[ %s ]} [concat {*}$rows]]

Comments:

  • You don't need to set the number of columns if you use read2matrix with auto
  • In this file, there is no empty cell after the table. Instead, we need to search for a string beginning with " These"
  • Since each cell holds a space character followed by the value, we need to trim off space around the value, otherwise the concatenation will go wrong. The part with the lmap command fixes that
  • Always brace your expressions

Documentation: + (operator), - (operator), < (operator), chan, close, concat, csv (package), expr, for, format, incr, lassign, lindex, lmap (for Tcl 8.5), lmap, open, package, puts, set, struct::matrix (package), {*} (syntax)

CL.
  • 173,858
  • 17
  • 217
  • 259
Peter Lewerin
  • 13,140
  • 1
  • 24
  • 27
  • Thanks peter ! This perfectly works for this example problem ! However, When I tried to apply it to my actual CSV file i got some problem . I was wondering if there is any way that I can share that data file with you so that u can comprehend what I am referring too ? Basically my data file contains a few different sets of matrix with various m*n and i am looking for specific data similar to what I gave as an example in my pervious post that you answered! Thanks for your help Peter ! – Reza Aug 09 '16 at 07:28
  • @Reza: can you post the file somewhere? I don't have any practical way to receive files (my son could probably set something up, but he is working). – Peter Lewerin Aug 09 '16 at 07:56
  • Please download it through this link (https://www.dropbox.com/s/li2ghbzrhd1ngxv/_SearchResults.csv?dl=0) Please let me know if you cant ? Basically I just wanna read the rows under "Horizontal-1 Acc. Filename and Horizontal-2 Acc. Filename" one at the time ! – Reza Aug 09 '16 at 08:07
  • @Reza: I have the file, but it's going to take a bit of time before I have a solution. – Peter Lewerin Aug 09 '16 at 08:27
  • No worries thanks very much ! If I can get the 2 columns under "Horizontal-1 Acc. Filename and Horizontal-2 Acc. Filename" and then covert the 2 columns to produce a vector as the final output it would be great. like Vector=[ RSN88_SFERN_FSD172.AT2 RSN88_SFERN_FSD262.AT2 RSN164_IMPVALL.H_H-CPE147.AT2 RSN164_IMPVALL.H_H-CPE237.AT2 RSN755_LOMAP_CYC195.AT2 RSN755_LOMAP_CYC285.AT2 RSN1083_NORTHR_GLE170.AT2 RSN1083_NORTHR_GLE260.AT2 ..................] – Reza Aug 09 '16 at 08:36
  • @Reza: added a tentative solution. – Peter Lewerin Aug 09 '16 at 11:06
  • Thanks very much Peter . When running I am getting this error which is related to the last line which is puts [format {Vector=[ %s ]} [concat {*}[m get rect $left $top $right $bottom]]] the error is : bad column index "", syntax error – – Reza Aug 09 '16 at 17:12
  • @Reza: it works fine for me: are you using exactly the same file as the one I downloaded from you? That error implies that it couldn't find the relevant column headers. – Peter Lewerin Aug 09 '16 at 18:06
  • @Reza: there was a difference in the files: I opened and saved the file in a spreadsheet, and the spreadsheet normalized the csv data. I downloaded your file again and 1) changed the line endings to Windows line endings, and 2) removed all blanks after commas (`s/, /,/g`). Another way to go about it would be to insert space chars at the beginning of each search pattern. – Peter Lewerin Aug 09 '16 at 19:23
  • Hi Peter I used the following script by using your advises and codes and it works now for the file I had shared with you. But now I am using it for another similar data file that you may find here (https://www.dropbox.com/s/sjdp7eza9u3v7y3/_SearchResults.csv?dl=0) but it gets me trouble . Could u please take a look while i am working on it as well ? I am adding the script I used below ! – Reza Aug 09 '16 at 19:28
  • @Reza: I've modified your script a bit, now it works for me at least. I'm not going to work on more files now, sorry. If you have other kinds of problems, please post them as new questions. But this procedure for extracting data has been demonstrated now, and you'll have to figure the rest out yourself. – Peter Lewerin Aug 09 '16 at 21:27
  • thanks peter ! I really appreciate your help ! Best regards ! RG. – Reza Aug 09 '16 at 21:53
0

wipe all

package require csv
package require struct::matrix

::struct::matrix m
m add columns 2

set chan [open _SearchResults.csv]
::csv::read2matrix $chan m  , auto
close $chan

set f1 [m search column 0 {Result ID}]
set headerRow [lindex $f1 0 1]

set f2 [m search rect 0 $headerRow 0 [expr {[m rows] - 1}] {}]
set  numofRow [lindex [lindex $f2 0 1]]

set headercol1 [m search row  $headerRow { Horizontal-1 Acc. Filename}]
set headercol2 [m search row  $headerRow  { Horizontal-2 Acc. Filename}]  

set indexheaderH1col [lindex $headercol1  0 0]
set indexheaderH2col [lindex $headercol2  0 0]

set header1 [m get cell $indexheaderH1col $headerRow]
set header2 [m get cell $indexheaderH2col $headerRow]

for {set r [expr $headerRow+1]} {$r < [expr $numofRow-1]} {incr r} {
    puts [format {%-30s   %s}  [m get cell $indexheaderH1col $r]  [m get cell $indexheaderH2col $r]]
}

set  vector   [concat {*}[m get rect $indexheaderH1col  [expr $headerRow+1] $indexheaderH2col [expr $numofRow-1]]]

puts [format {Vector=[ %s ]} [concat {*}[m get rect $indexheaderH1col  [expr $headerRow+1] $indexheaderH2col [expr $numofRow-1]]]]
Peter Lewerin
  • 13,140
  • 1
  • 24
  • 27
Reza
  • 19
  • 1
  • 6