Length comparison of one specific field in linux

Question

I was trying to check the length of second field of a TSV file (hundreds of thousands of lines). However, it runs very very slowly. I guess it should be something wrong with "echo", but not sure how to do.

Input file:

prob    name
1.0     Claire
1.0     Mark
...     ...
0.9     GFGKHJGJGHKGDFUFULFD

So I need to print out what went wrong in the name. I tested with a little example using "head -100" and it worked. But just can't cope with original file.

This is what I ran:

for title in `cat filename | cut -f2`;do
length=`echo -n $line | wc -m`
if [ "$length" -gt 10 ];then
echo $line
fi
done

add a testable fragment of your `filename` and the final expected output — RomanPerekhrest, Mar 22 '18 at 10:37
Have a look [there: Length of string in bash](https://stackoverflow.com/a/31009961/1765658) — F. Hauri - Give Up GitHub, Mar 22 '18 at 12:42

Igor S.K. · Answer 1 · 2018-03-22T10:51:46.447

1

Try this probably:

cat file.tsv | awk '{if (length($2) > 10) print $0;}'

This should be a bit faster since the whole processing is done by the single awk process, while your solution starts 2 processes per loop iteration to make that comparison.

edited Mar 22 '18 at 10:51

answered Mar 22 '18 at 10:44

Igor S.K.

999
6
17

It will be a lot faster. But you should remove the useless use of `cat`, and like I said to Shravan, everything except `length($2) > 10` is unnecessary. – Tom Fenech Mar 22 '18 at 11:03
@TomFenech Sure, just to be perfect. :) – Igor S.K. Mar 22 '18 at 11:07

Shravan Yadav · Answer 2 · 2018-03-22T10:53:46.783

1

We can use awk if that helps.

awk '{if(length($2) > 10){print}}' filename

$2 here is 2nd field in filename which runs for every line. It would be faster.

edited Mar 22 '18 at 10:53

answered Mar 22 '18 at 10:45

Shravan Yadav

1,297
1
14
26

Everything except `length($2) > 10` is unnecessary in that awk script. – Tom Fenech Mar 22 '18 at 11:04

oliv · Accepted Answer · 2018-03-22T11:39:51.823

1

awk to rescue:

awk 'length($2)>10' file

This will print all lines having the second field length longer than 10 characters.

Note that it doesn't require any block statement {...} because if the condition is met, awk will by default print the line.

edited Mar 22 '18 at 11:39

answered Mar 22 '18 at 10:57

oliv

12,690
25
45

Your script is correct but a few words to explain would be useful. Otherwise, your answer may as well be "`awk` to the rescue, and ask again on Stack Overflow next time you want to do anything". – Tom Fenech Mar 22 '18 at 11:08
Thanks! I don't really know about awk, but it seems quite useful. I'll learn about it. – Luca Mar 22 '18 at 16:24

Length comparison of one specific field in linux

3 Answers3