2

I have one folder with thousands of .txt files. I am using a windows batch code to delete headers (line 1 to 82) from all .txt files inside that folder. The thing is this code works well for relatively small files, but now I need to use it on big files, and the code simply does not respond.

Can someone help me code on python what this windows batch does? Thank you in advance.

@echo off
for %%f in (*.txt) do (
    more +82 "%%f" > "%TEMP%\%%f"
    move /y "%TEMP%\%%f" "%%f" > nul
)
echo Done.
Compo
  • 36,585
  • 5
  • 27
  • 39
  • 1
    https://stackoverflow.com/questions/55620008/looking-for-more-move-solutions-that-can-handle-files-with-more-than-65534-rows/55623108#55623108 `cut t x 82 < inputfile.txt > outputfile.txt` – Noodles Apr 17 '19 at 23:47
  • As above, changing `17` to `82`, and `csv` to `txt` and you've got effectively the same issue. You'll note that there are answers given which utilise built-in languages, which is better than relying on something which needs specifically installing and configuring. Please also note that this site doesn't provide a free language conversion service, we expect that you attempt that yourself. – Compo Apr 18 '19 at 00:31
  • You could also check [this](https://stackoverflow.com/q/2064184), for a python solution for a single file. As your chosen alternative language, I'm sure you should be able to expand it to loop through all `.txt` files in a directory. – Compo Apr 18 '19 at 00:55
  • 1
    Just so you know, MORE normally does not pause when you redirect output to a file. But for some reason when redirected MORE reaches 64k lines it pauses, asking for a key press to continue. That is why your original batch script seems to hang with large files. – dbenham Apr 18 '19 at 03:59

2 Answers2

0

Probably overkill, but this might work:

import tempfile
from io import StringIO
data = StringIO()

file_path = r'C:\Users\...\...'

# Set the numder of lines you'd like to exclude
header_end = 82


### Read your data into a StringIO container (untested for directory read!)
for i in os.listdir(file_path):
    if i.endswith('.txt'):
        with open(os.path.join(file_path, i), 'r') as f:
            data.write(f.read())

        ### Split linkes by \n (newline)
        tokens = data.getvalue().split('\n')

        ### Rejoin with a newline, but start at the header index value plus one.
        output_str = '\n'.join(tokens[header_end + 1:])

        ### Create a tempfile with '.txt' suffix; print(path) to find out file location (should be in temp folder)
        fd, path = tempfile.mkstemp(suffix='.txt')
        try:
            with os.fdopen(fd, 'w') as tmp:
                tmp.write(output_str)
        except IOError:
            print('Error writing temp file.')


        ### To rcleanup and remove the file
        if os.path.isfile(path):
            try:
                os.remove(path)
            finally:
                os.unlink(path)
Mark Moretto
  • 2,344
  • 2
  • 15
  • 21
  • Thank you so much for your code but I'm getting this error: `code` D:\>eraseheaders.py Traceback (most recent call last): File "D:\eraseheaders.py", line 11, in with open(file_path, 'r') as f: IOError: [Errno 13] Permission denied: 'D:\\txtfiles\\' `code` – AlbRodriguez Apr 18 '19 at 09:51
  • Okay, I added a little section to process only text files. For your directory, are you using `'D:\\txtfiles\\'`? If so, you can leave off the backslashes at the end or nest the regular path in a raw string: `'D:\\txtfiles'` or `r'D:\txtfiles'` – Mark Moretto Apr 18 '19 at 14:08
0

A PowerShell script not writing to temp but moving original to a bak file before skipping the first 82 lines.

foreach ($File in (Get-ChildItem *.txt)){
  $BakFile = $File.FullName -replace 'txt$','bak.txt'
  Move-Item $File $BakFile -Force
  Get-Content $BakFile | Select-Object -Skip 82 | Set-Content $File
}

To be on topic the same wrapped in a batch command/file

powershell -NoP -C "foreach ($File in (Get-ChildItem *.txt)){$BakFile = $File.FullName -replace 'txt$','bak.txt';Move-Item $File $BakFile -Force;Get-Content $BakFile | Select-Object -Skip 82 | Set-Content $File}"