247

I'm looking for a simple way to find the length of the longest line in a file. Ideally, it would be a simple bash shell command instead of a script.

Andrew Prock
  • 6,900
  • 6
  • 40
  • 60

14 Answers14

323

Using wc (GNU coreutils) 7.4:

wc -L filename

gives:

101 filename
Daniel
  • 3,246
  • 1
  • 16
  • 2
  • 61
    Note that only the `-c -l -m -w` options are POSIX. `-L` is a GNUism. – Jens Aug 30 '11 at 07:24
  • 5
    Note also that the result of `-L` depends on the locale. Some characters (both in the byte and in the multibyte sense) may even not be counted at all! – Walter Tross Jul 18 '14 at 09:13
  • 2
    Note that for Windows users, the wc.exe that comes with Cygwin supports -L. – yoyo May 11 '15 at 18:09
  • 12
    OS X: `wc: illegal option -- L usage: wc [-clmw] [file ...]` – Hugo Feb 11 '16 at 20:32
  • 15
    OS X: using homebrew, use gwc for GNU Word Count gwc -L filename – kaycoder Jul 12 '16 at 16:39
  • 4
    @xaxxon `gwc` is in the `coreutils` formula, which install all of the GNU coreutils with a `g` prefix. – gsnedders Feb 13 '17 at 15:10
  • Just applied this to read the max-length of a single stream line in a big PDF (=binary file). Results in 2330 instantly while the "pure posix script" (one-liner) leads to 2226 with the expected waiting... So be aware that you may have bigger differences in binary files. – Simon Sobisch Jun 09 '17 at 06:33
  • 1
    OS X: Rather than install `gwc`, `awk`, mentioned in other answers, works just fine. `awk '{print length}' filename | sort -rn | head -1`. If you need the actual line's content too, then `awk '{print length,$0}' filename | sort -k1 -rn| head -1` – kakoma Nov 30 '17 at 03:59
  • omg, I was expecting an awkward awk command, this is really neat – Nakor Aug 28 '21 at 01:07
135
awk '{print length, $0}' Input_file |sort -nr|head -1

For reference : Finding the longest line in a file

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
Ravindra S
  • 6,302
  • 12
  • 70
  • 108
  • 13
    Why the extra cat command? Just give the file name directly as an argument to awk. – Thomas Padron-McCarthy Oct 31 '09 at 21:40
  • 21
    @Thomas. Expressing it as a pipe is more general than specifying a file as an option. In my case, I'll be using output piped from a database query. – Andrew Prock Oct 31 '09 at 23:31
  • 2
    this one is the best answer because it is more POSIX (well, works on OS X) – MK. Dec 08 '14 at 17:23
  • 5
    @MK. However, this approach is O(n*log(n)) in the number of lines, whereas Ramon's approach is O(n). – jub0bs Sep 04 '15 at 19:05
  • 3
    Sorting a large file can take hours to complete and consume gigabytes, even terabytes of temp space depending on input file size. Consider storing the longest length and its associated record, then printing it from an `END{}` block. – Luv2code Feb 20 '19 at 23:57
77
awk '{ if (length($0) > max) {max = length($0); maxline = $0} } END { print maxline }'  YOURFILE 
Ramon
  • 8,202
  • 4
  • 33
  • 41
  • 3
    ```awk '{ if (length($0) > max) max = length($0) } END { print max }' YOURFILE``` – ke20 Sep 02 '13 at 13:53
  • 7
    `awk 'length>max{max=length}END{print max}' file` – Chris Seymour Dec 25 '13 at 20:38
  • 10
    This answer gives the *text* of the longest line in the file rather than its length. I'm leaving it as-is even though the question asks for the length because I suspect it will be useful for people who come to this page just looking at the title. – Ramon Jan 03 '14 at 13:51
  • 3
    Easy to get the count using WC.. `awk '{ if (length($0) > max) {max = length($0); maxline = $0} } END { print maxline }' YOURFILE | wc -c` – Nick Apr 15 '14 at 23:10
  • 1
    Would you please give explanation that how this works? – Lnux Mar 09 '17 at 08:45
  • @Lnux `awk` is a language that supports looping. @Ramon's answer compares the length of each line to the previous, stores the highest value in `max` and the line contents in `maxline` – SaxDaddy Jul 27 '17 at 19:45
  • 2
    @Nick Better yet, `... END { print length(maxline) + 1}` The `wc -c` will still come up one byte short of the longest record because awk strips the line feed off. – Luv2code Feb 21 '19 at 00:04
  • My variation on @ramon's fine answer, I had a file with a 1GB line (ouch), I just needed the line number to find it: `awk '{ if (length($0) > maxlen) {maxnr = NR; maxlen = length($0)} } END { print maxnr, maxlen }'` – chrisinmtown Oct 16 '22 at 10:46
25

Just for fun and educational purpose, the pure POSIX shell solution, without useless use of cat and no forking to external commands. Takes filename as first argument:

#!/bin/sh

MAX=0 IFS=
while read -r line; do
  if [ ${#line} -gt $MAX ]; then MAX=${#line}; fi
done < "$1"
printf "$MAX\n"
Jens
  • 69,818
  • 15
  • 125
  • 179
  • 6
    not being able to read from std in (via cat) actually reduces the utility of this, not enhances it. – Andrew Prock Aug 30 '11 at 03:21
  • 4
    Well, the OP explicitly said "file" and without the `< "$1"` it can easily read from stdin. With a test for `$#` it could even do both, depending on the number of args. There just is no need for useless cats in this world. Newbies should be taught accordingly right from the beginning. – Jens Aug 30 '11 at 07:18
  • 7
    This should be rated higher, it's what the user asked for. Add function longest () { MAX=0 IFS= while read -r line; do if [ ${#line} -gt $MAX ]; then MAX=${#line}; fi done echo $MAX } to your .bashrc and you can run `longest < /usr/share/dict/words` – skierpage Dec 12 '12 at 01:10
14
wc -L < filename

gives

101
Cairnarvon
  • 25,981
  • 9
  • 51
  • 65
Anonymous
  • 149
  • 1
  • 2
12
perl -ne 'print length()."  line $.  $_"' myfile | sort -nr | head -n 1

Prints the length, line number, and contents of the longest line

perl -ne 'print length()."  line $.  $_"' myfile | sort -n

Prints a sorted list of all lines, with line numbers and lengths

. is the concatenation operator - it is used here after length()
$. is the current line number
$_ is the current line

Chris Koknat
  • 3,305
  • 2
  • 29
  • 30
  • Requires sorting a file .. performance would be terrible even for moderately sized files and will not work for larger files. `wc -L` is best solution I saw so far. – Tagar Mar 11 '17 at 21:09
  • Using a 550MB 6,000,000 line text file as the source (British National Corpus), the perl solution took 12 seconds, while `wc -L` took 3 seconds – Chris Koknat Sep 26 '17 at 17:58
  • `wc -L` just count number records - this Q was about to find *longest* line - not quite the same, so this isn't accurate comparison. – Tagar Sep 26 '17 at 19:44
9

Looks all the answer do not give the line number of the longest line. Following command can give the line number and roughly length:

$ cat -n test.txt | awk '{print "longest_line_number: " $1 " length_with_line_number: " length}' | sort -k4 -nr | head -3
longest_line_number: 3 length_with_line_number: 13
longest_line_number: 4 length_with_line_number: 12
longest_line_number: 2 length_with_line_number: 11
Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
wangf
  • 895
  • 9
  • 12
  • There we go. That finds my obnoxiously long comments. Thanks dude. – Philip Nov 09 '15 at 18:04
  • 2
    You could take this a step further and eliminate cat. `awk '{print length}' test.txt | sort -rn | head -1`. If you need the actual line's content too, then `awk '{print length,$0}' test.txt | sort -k1 -rn| head -1` – kakoma Nov 30 '17 at 04:00
6

Important overlooked point in the above examples.

The following 2 examples count expanded tabs

  wc -L  <"${SourceFile}" 
# or
  expand --tabs=8 "${SourceFile}" | awk '{ if (length($0) > max) {max = length($0)} } END { print max }'

The following 2 count non expaned tabs.

  expand --tabs=1 "${SourceFile}" | wc -L 
# or
  awk '{ if (length($0) > max) {max = length($0)} } END { print max }' "${SourceFile}"

so

              Expanded    nonexpanded
$'nn\tnn'       10            5
John Kearney
  • 322
  • 3
  • 9
3

Here are references of the anwser

cat filename | awk '{print length, $0}'|sort -nr|head -1

http://wtanaka.com/node/7719

Nadir SOUALEM
  • 3,451
  • 2
  • 23
  • 30
3

In perl:

perl -ne 'print ($l = $_) if (length > length($l));' filename | tail -1

this only prints the line, not its length too.

rsp
  • 23,135
  • 6
  • 55
  • 69
3

Variation on the theme.

This one will show all lines having the length of the longest line found in the file, retaining the order they appear in the source.

FILE=myfile grep `tr -c "\n" "." < $FILE | sort | tail -1` $FILE

So myfile

x
mn
xyz
123
abc

will give

xyz
123
abc
martin clayton
  • 76,436
  • 32
  • 213
  • 198
3

Just for fun, here's the Powershell version:

cat filename.txt | sort length | select -last 1

And to just get the length:

(cat filename.txt | sort length | select -last 1).Length
Eddie Groves
  • 33,851
  • 14
  • 47
  • 48
  • 4
    So even the powershell programmers must use useless cats? – Jens Aug 30 '11 at 07:19
  • 1
    @Jens Not sure I understand you, cat in Powershell is just an alias for Get-Content, whose behaviour depends on the context and provider. – Eddie Groves Sep 22 '11 at 06:49
  • Can `sort` take filename.txt as argument? Then the cat is useless because `sort length filename.txt | select -last 1` avoids a pipe and a process that just copies data around. – Jens Sep 22 '11 at 07:58
  • As a sidenote what exactly is powershell? I thought the powershell utility was used for windows machines? – franklin Mar 16 '12 at 18:32
  • 4
    @Jens, data frequently comes from a stream instead of a filename. This is a standard unix tools idiom. – Andrew Prock Jun 08 '12 at 19:35
3

I'm in a Unix environment, and work with gzipped files that are a few GBs in size. I tested the following commands using a 2 GB gzipped file with record length of 2052.

  1. zcat <gzipped file> | wc -L

and

  1. zcat <gzipped file> | awk '{print length}' | sort -u

The times were on avarage

  1. 117 seconds

  2. 109 seconds

Here is my script after about 10 runs.

START=$(date +%s) ## time of start

zcat $1 |  wc -L

END=$(date +%s) ## time of end
DIFF=$(( $END - $START ))
echo "It took $DIFF seconds"

START=$(date +%s) ## time of start

zcat $1 |  awk '{print length}' | sort -u

END=$(date +%s) ## time of end
DIFF=$(( $END - $START ))
echo "It took $DIFF seconds"
Jon
  • 2,373
  • 1
  • 26
  • 34
  • I am not sure this is a valid comparison, I would be worried that the `awk` version benefits from disk block caching of the `wc` version that is running first (and seeds the disk cache). You would have to randomize the order of who gets called first over the ten runs to make this argument stick. – Canonical Chris Feb 06 '18 at 18:36
1

If you are using MacOS and are getting this error: wc: illegal option -- L you dont need to install GNU sipmly do this.

If all you want to do is just get the count of the characters in the longest line of the file and you are using OS X run:

awk '{print length}' "$file_name" | sort -rn | head -1

Something like this;

echo "The longest line in the file $file_name has $(awk '{print length}' "$file_name" | sort -rn | head -1) characters"

Outputs:

The longest line in the file my_file has 117 characters

Ivansito87
  • 19
  • 1
  • 3