0

Preferably a one-liner, how could I delete a range of lines at the beginning from a large (3MB+) text file in a timely fashion (few seconds max). I've seen solutions using for /f along with findstr, but the for loop made it extremely slow, and the tool more cannot handle larger files without hanging.

@echo off &setlocal
set "testing.txt=%~1"
(for /f "delims=" %%i in ('findstr /n "^" "testing.txt"') do (
    set "line=%%i"
    for /f "delims=:" %%a in ("%%i") do set "row=%%a"
    setlocal enabledelayedexpansion
    set "line=!line:*:=!"
    if !row! gtr 100 echo(!line!
    endlocal
))>output.txt

Here is an attempt. It is incredibly slow. Any recommendations would be appreciated.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
user1995565
  • 165
  • 1
  • 10
  • 2
    Without fully knowing what you mean by a range of lines, _especially as I would expect `findstr` to be more suited to string matching_, it would be impossible to advise you at this time. You're effectively asking for the quickest and shortest code to do something very specific, but witholding the specific information. – Compo Jul 19 '19 at 01:09
  • 1
    As well as fully explaining the task, as this is a programming help site, we expect to see the code that you've tried and what you tried it against, in order to attempt to improve upon it. – Compo Jul 19 '19 at 01:12
  • @Compo. Review the edits and provide advice, please. – user1995565 Jul 19 '19 at 01:52
  • 1
    When I read your code correctly, you look for something like `more +100` (if `more` was able to handle such large files)? – Stephan Jul 19 '19 at 06:43
  • Exactly Stephan. The problem with this code though is that it takes a very, very long time on large files. I suppose that it's a step in the right direction, since more would just hang. Do you have any thoughts on improving this? – user1995565 Jul 19 '19 at 11:08
  • What about the `skip=100` option of `for /F`? Do you need empty lines to be preserved? Nevertheless, in your code I'd remove the inner `for /F` loop and replace it with `set /A "row=line"` (to concert everything up to the first `:` to a number, limited to 2^31 - 1); this will for sure speed things up a bit... – aschipfl Jul 19 '19 at 11:43

2 Answers2

0

As far as I got you want to skip the first 100 lines of your huge text file and want to return the rest.

Well, when I look to your code the first thing I see is you have two for /F loops nested, which might slow things down.

The inner loop just splits off a preceding line number that is separated by a colon from the rest.

For this purpose you could (mis-)use set /A, which is capable of converting a string into a numeric value, when you use its implicit variable expansion (hence no % or !); this process stops when the first non-numeric character is encountered, which is the : in our situation. So just replace the inner for /F loop with:

set /A "row=line"

This will for sure speed things up a bit. However, regard that this limits the input text file to 2^31 - 1 lines. Note, that the number of characters/bytes per line is still limited to about 8190.

By the way, you do not have to do set "line=!line:*:=!" as a separate step, just remove this and replace echo(!line! with echo(!line:*:=!.


If you do not need to preserve empty lines, the whole approach is as simple as this:

@echo off
for /F usebackq^ skip^=100^ delims^=^ eol^= %%i in ("testing.txt") do echo(%%i

This does not limit the file size, but the line lengths must not exceed 8191 characters/bytes.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
  • Op's code uses `findstr /n "^"` command _just to insert a line number in each line_ and, after that, he/she uses the inner `for /f` command to split the line number as you indicated. However, this is not necessary. A simple `set /A row+=1` would give the same result... – Aacini Jul 19 '19 at 13:22
  • @Aacini, true, but I guess the OP also wants to preserve blank lines, so they'll need `findstr /N` anyway... – aschipfl Jul 19 '19 at 14:06
  • I actually did not need to preserve blank lines. Thank you both very much for contributing your answers! – user1995565 Jul 19 '19 at 22:09
0

This is the fastest way to eliminate the first lines in a large file. Is written as one-liner, as you requested:

@echo off
< testing.txt ( (for /L %%i in (1,1,100) do set /P "=") & findstr "^" ) > output.txt

However, be aware that this method can only manage lines up to 1023 characters long because it uses set /P command to read and discard not desired lines...

For a description of this method, see this answer.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • Very nice, but this method requires the very last line to be terminated with a line-break; otherwise, `findstr` will hang indefinitely... – aschipfl Jul 20 '19 at 10:13
  • @aschipfl: You are right! I think that a very simple `>>testing.txt echo/` line before would solve this point, if such a point was a problem... – Aacini Jul 20 '19 at 15:47