I have a csv file with headers:
a,b,c,d,e,f,g,h
I would like to do something
cat abc.csv | sed "something to split them" | grep "e"
#position of "e"
Can someone guide me how do I get the column idx of which header 'e' is at?
I have a csv file with headers:
a,b,c,d,e,f,g,h
I would like to do something
cat abc.csv | sed "something to split them" | grep "e"
#position of "e"
Can someone guide me how do I get the column idx of which header 'e' is at?
Assuming your goal is to say "which column is this value in", you have a number of options, but this works:
sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e'
#output: 5:e
If you want to get just the number out of that:
sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e' | cut -d: -f1
#output: 5
Explanation:
Since the headers are on the first line of the file, we use the -n
option to tell sed
not to print out all the lines by default. We then give it an expression that starts with 1
, meaning it is only executed on the first line, and ends with p
, meaning that line gets printed out afterward.
The expression uses ANSI quotes ($'
...'
) simply so it's easier to read: you can put a newline in it with \n
instead of having to include a literal newline. Regardless, by the time the shell is done with it, the expression $'1s/,/\\\n/gp'
gets passed to sed as 1s/,/\
/gp
, which tells it to replace every comma on the first line with a newline and then print out the result. The output of just the sed on your example would be this:
a
b
c
d
e
f
g
h
(If your CSV file has many lines, you may want to add ;q
to the end of the sed
command so that it quits after that first line instead of continuing to read and do nothing with the rest of the lines.)
We then pipe that output through a grep
command looking for e
. We pass the -x
option so that it only matches lines consisting of exactly 'e', not just any line containing an 'e' (Thanks @Marcel and @Sundeep), plus the -n
option that tells it to include the line number of matching lines in its output. In the example, it outputs 5:e
, where the 5:
says that the rest of the output is from the 5th line of the input.
We can then pipe that through cut
with a field delimiter (-d
) of :
to extract just the first field (-f1
), which is the line number in the sed output - which is the field number in the original file.
head -n1 abc.csv | tr "," "\n" | grep -nx e | cut -d":" -f1
File content:
a,b,c,d,e,f,g,h
String that you want:
e
Output:
5
This is a bit of hack, but it will give you the index of e
:
head -n1 abc.csv | grep -oE '^.*(,|^)e(,|$)' | tr -Cd , | wc -c
It works by extracting the part of the top row up to the e
, then it removes all characters except for the commas, and finally it counts the number of commas.
$ cat ip.txt
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8
With perl
$ # can also use: perl -F, -lane 'print grep {$F[$_-1] eq "e"} 1..$#F+1; exit'
$ perl -F, -lane 'foreach (0..$#F){ print $_+1 if $F[$_] eq "e" } exit' ip.txt
5
With awk
$ awk -F, '{ for(i=1; i<=NF; i++) if($i=="e"){print i} exit}' ip.txt
5
In both cases:
,
and comparison made against each element to find matching indexexit
is used to quit immediately after reading first lineFollowing command will loop through all the fields and check for the string "e". If Found then print the index of that location.
For all the lines:
awk -F, '{for(i=1;i<=NF;i++) if($i=="e") print i}' input.csv
IF it's limited to header/ first row then :
awk -F, 'NR==1{for(i=1;i<=NF;i++) if($i=="e") {print i;exit}}' input.csv
Example:
echo "a,b,c,d,e,f,g,h" |awk -F, '{for(i=1;i<=NF;i++) if($i=="e") {print i}'
5
$ awk '{print (index($0,"e")+1)/2}' file
5