99

I want to split a 400k line long log file from a particular line number.

For this question, lets make this an arbitrary number 300k.

Is there a linux command that allows me to do this (within the script)?

I know split lets me split the file in equal parts either by size or line numbers but that's not what I want. I want to the first 300k in one file and the last 100k in the second file.

Any help would be appreciated. Thanks!

On second thoughts this would be more suited to the superuser or serverfault site.

denormalizer
  • 2,186
  • 3
  • 24
  • 36
  • 18
    I think this question is fine here. You have a programming task that you're trying to solve with a shell script; if it's a one-liner using widely available Unix tools, so much the better! – Jim Lewis Jun 18 '10 at 03:10
  • I thought the same. But then again I wasn't writing a shell script :) oh well, found my answer anyways. Thanks – denormalizer Jun 18 '10 at 04:25
  • 5
    This question is imho fine, without a doubt, it is a programming question and it is not too localized either – Peter Sep 17 '13 at 11:11
  • 9
    why is this an off topic question? the thought police is more crazy than ever. – Karel Bílek Nov 20 '13 at 01:02
  • I think I will edit the question to appease the Gods :) – denormalizer Nov 20 '13 at 22:56
  • 7
    Though this question might be a bit off-topic, it is highly voted and is the first result in search engines with such queries "linux split file at line". Thus, I'd suggest to reopen this question, so that other valuable answers can be added. Or at least make a link to the most relevant question on SU. – Antoine Pinsard May 30 '17 at 14:45

1 Answers1

184
file_name=test.log

# set first K lines:
K=1000

# line count (N): 
N=$(wc -l < $file_name)

# length of the bottom file:
L=$(( $N - $K ))

# create the top of file: 
head -n $K $file_name > top_$file_name

# create bottom of file: 
tail -n $L $file_name > bottom_$file_name

Also, on second thought, split will work in your case, since the first split is larger than the second. Split puts the balance of the input into the last split, so

split -l 300000 file_name

will output xaa with 300k lines and xab with 100k lines, for an input with 400k lines.

rubo77
  • 19,527
  • 31
  • 134
  • 226
academicRobot
  • 6,097
  • 1
  • 31
  • 29
  • Thanks. found a similarly answered question over at superuser ie. use tail etc And, yes split will work with my example, but not always had my example been 100K. – denormalizer Jun 18 '10 at 04:24
  • 2
    If you're trying to do this on Windows and don't want to use Cygwin, this project provides all the needed utils as native win32 binaries - http://unxutils.sourceforge.net/ – Jonathon Hill Dec 30 '11 at 03:27
  • 17
    I would use `tail -n +L file_name > bottom_file` where simply `L=K+1` with no need to run `wc` first – Hashbrown Sep 14 '15 at 08:15
  • 7
    I would rather use `sed -n '1,1000p' test.log > top_test.log ; sed '1,1000d' test.log > bottom_test.log`. IHMO, this is more straightforward, and does not require to calculate the total number of lines. Also, it still works if lines were appended between the execution of each command. – Antoine Pinsard May 30 '17 at 14:39
  • For some files this answer leaves a line out, but the edit suggested by Hashbrown fixes the issue. – scharette Nov 24 '17 at 02:33
  • There's a nice write-up of different scripting approaches here: https://www.baeldung.com/linux/split-file-at-line-numbers , comparing timings of head-and-tail, sed and awk. Head-and-tail is much faster, surprisingly, then the others – jmullee Mar 19 '21 at 13:46