Is there a function in gnuplot which returns the number of columns in a csv file? I can't find anything in the docs, maybe someone can propose a custom made function for this?
3 Answers
As of gnuplot4.6, you can make a little hack script to do this. It is certainly not the most efficient, but it is pure gnuplot:
#script col_counter.gp
col_count=1
good_data=1
while (good_data){
stats "$0" u (valid(col_count))
if ( STATS_max ){
col_count = col_count+1
} else {
col_count = col_count-1
good_data = 0
}
}
Now in your main script,
call "col_counter.gp" "my_datafile_name"
print col_count #number of columns is stored in col_count.
This has some limitations -- It will choke if you have a column in the datafile that is completely non-numeric followed by more valid columns for example, but I think that it should work for many typical use cases.
print col_count
As a final note, you can use the environment variable GNUPLOT_LIB
and then you don't even need to have col_counter.gp
in the current directory.

- 300,191
- 65
- 633
- 696
-
I borrowed your idea for my answer here: [accessing the nth datapoint in a datafile using gnuplot](http://stackoverflow.com/questions/13986596/accessing-the-nth-datapoint-in-a-datafile-using-gnuplot/17416567#17416567) – syockit Jul 02 '13 at 02:39
Assuming this is related to this question, and that the content of infile.csv is:
n,John Smith stats,Sam Williams stats,Joe Jackson stats
1,23.4,44.1,35.1
2,32.1,33.5,38.5
3,42.0,42.1,42.1
You could do it like this:
plot.gp
nc = "`awk -F, 'NR == 1 { print NF; exit }' infile.csv`"
set key autotitle columnhead
set datafile separator ','
plot for [i=2:nc] "< sed -r '1 s/,([^ ]+)[^,]+/,\\1/g' infile.csv" using 1:i with lines
Note that the \1
needs escaping when used within "
in Gnuplot.
Output:
Here is an update and an alternative extended retro-workaround: (of course gnuplot-only)
Update: (gnuplot>=5.0.0, Jan 2015)
Since gnuplot 5.0.0, there is the variable STATS_columns
which will tell you the number of columns of the first unommented row.
stats FILE u 0 nooutput
print STATS_columns
Extended retro-workaround: (gnuplot>=4.6.0, March 2012)
Some time ago, I learnt that a correct CSV file should have the same number of columns (i.e. commas) in all rows. So it should be sufficient to "count" the commas in the first uncommented row. That's apparently what gnuplot>=5.0.0 is doing more or less.
However, in case you have an "incorrect CSV" with varying columns and you are interested in the minimum and maximum number of columns, you can use the following script, assuming that there are no (doublequoted) strings having a comma inside. Note, row indices are 0-based.
Data: SO13373206.dat
11, 12, 13, 14, 15, 16, 17
21, 22, 23, 24, 25, 26, 27, 28
31, 32, 33, 34, 35, 36, 37, 38, 39
41, 42, 43, 44, 45, 46
Script:
### count number of columns (gnuplot>=4.6.0)
reset
FILE = "SO13373206.dat"
countCommas(s) = sum[i=1:strlen(s)] ( s[i:i] eq ',' ? 1 : 0)
set datafile separator "\t" # in order to read a row as one string
stats FILE u (colCount=countCommas(strcol(1))+1,0) every ::0::0 nooutput
print sprintf("number of columns in first row: %d", colCount)
colMin = colMax = rMin = rMax = NaN
stats FILE u (c=countCommas(strcol(1))+1, \
c<colMin || colMin!=colMin ? (colMin=c,rMin=$0) : 0, \
c>colMax || colMax!=colMax ? (colMax=c,rMax=$0) : 0 ) nooutput
print sprintf("minimum %d columns in row %d",colMin, rMin)
print sprintf("maximum %d columns in row %d",colMax, rMax)
set datafile separator "," # restore separator
# ... plot something
### end of script
Result:
number of columns in first row: 7
minimum 6 columns in row 3
maximum 9 columns in row 2

- 22,244
- 5
- 28
- 72