-3

I am completely new to Python as I have one specific task that I want to complete. I have a large dataset of .XY files (essentially .txt files), each of which has a header of 23 lines. I wish to use Python (Python 3.7, through visual studio code) to remove the header from all files (either delete from the original files, or write to new files) with the same format as the original file. An example of the top of a file I wish to edit is shown below:

# Distance Sample to Detector: 0.3004918066158592 m
# PONI: 1.261e-01, 1.147e-01 m
# Rotations: 0.000061 0.000011 -0.000000 rad
# 
# == Fit2d calibration ==
# Distance Sample-beamCenter: 300.492 mm
# Center: x=1529.147, y=1680.772 pix
# Tilt: 0.004 deg  TiltPlanRot: 169.652 deg
# 
# Detector Detector  Spline= None    PixelSize= 7.500e-05, 7.500e-05 m
#    Detector has a mask: False 
#    Detector has a dark current: False 
#    detector has a flat field: False 
# 
# Wavelength: 4.1069000000000004e-11 m
# Mask applied: None
# Dark current applied: None
# Flat field applied: None
# Polarization factor: None
# Normalization factor: None
#
# 2th_deg    I
1.441032378E+00  -3.563451171E-01
1.447230367E+00  1.410741210E-01
1.453428356E+00  6.531007886E-01
1.459626345E+00  1.176007986E+00
1.465824333E+00  1.784591913E+00
matsmcfly
  • 5
  • 3
  • Check out: [How to delete first n lines from text file](https://stackoverflow.com/questions/2064184/remove-lines-from-a-textfile) – DarrylG Oct 14 '21 at 21:12

2 Answers2

1

Open the file, read it in, and then only use the lines you need.

Using with

Using with will open the file and then it will automatically close the file object after the block completes.

with open('filename.txt', 'r') as input_file:
    try:
        lines = input_file.readlines()
        input_you_need = lines[23:]
        #do something with input_you_need
    except:
        #handle the error

Using open and close

Using open will open the file for the the remainder of the script, or until you close it. ALWAYS CLOSE YOUR FILES

# Using readlines()
try:
    file1 = open('myfile.txt', 'r')
    try:
        Lines = file1.readlines()
        lines_needed = Lines[23:]
    except:
        #some sort of error handling
    finally:
        file1.close()
except:
    #more error handling

# writing to file
try:
    file1 = open('myfile.txt', 'w')
    try:
        file1.writelines(lines_needed)
    except:
        #error handling
    finally:
        file1.close()
try:
    #more error handling

As you can tell, the open and close is a lot more lines of code. This is why (for simpler scripts) it's usually preferable to use the with method.

CVerica
  • 327
  • 1
  • 10
  • 1
    The open and close version does not follow your own injunction to close the file when an exception occurs. You need to use `try`-`except` for that to happen. – Mad Physicist Oct 14 '21 at 21:27
  • You're not wrong. I was just trying to get the basics across. I'll make a correction. – CVerica Oct 15 '21 at 03:11
  • It's a common beginner mistake to say things like "EDIT" or "I've made an edit" to bring attention to previous work. Please don't do that. The previous version was wrong and you shouldn't bring attention to it. – Mad Physicist Oct 15 '21 at 10:35
  • Yes, I just noticed the edit summary entry box. I am still kinda new here. Thanks for your guidance. – CVerica Oct 15 '21 at 18:08
  • No problem. Thanks for keeping up with it. – Mad Physicist Oct 15 '21 at 18:52
-1

Your program will need to go through the following steps:

  1. Iterate through all the files within your dataset
  2. Read in each file's contents and
  3. Write only the required lines into a new file.

For the first step the os module provides a handy walk() function it takes in a root path and returns a list of all subdirectories under the given path as a tuple. The first element of this tuple is the path to that subdirectory, the second is a list of all folders and the third is a list of all files within that subdirectory.

When iterating all subdirectories a nested loop through all file names in the third tuple element allows you to iterate all files within each subdirectory. Once you go through all files you can simply read the files contents using python's with keyword. (The second parameter to the open function tells it that you want to read)

with open("path/to/file", "r") as f:

    lines = f.readlines()

This allows you to read all lines as a list of strings into the variable lines.

Similarly you can write to a file pretty much the same way but this time you need to specify "w" as the second parameter since you want to have writing access to the new file.

with open("path/to/other_file", "w") as f:

    f.writelines(["line1", "line2"])

This code writes a given list of lines into the file. Assuming you have already read all lines from the existing file into lines you can simply take the lines you need using list slicing: lines[22:] returns all lines starting from the 22nd element of the list lines.

Therefore you can write f.writelines(lines[22:]) into the new file.

Something similar to this should work for you:

import os

#iterate all files from current directory
#you can overwrite the path (".") to suit your needs
for path, folders, files in os.walk("."): 

    for file in files:

        name, extention = os.path.splitext(file)

        # make sure only .XY files are affected
        if not extention == ".XY":
            continue

        lines = None

        # read lines from existing .XY file
        with open(os.path.join(path, file), "r") as file:

            lines = file.readlines()

        # write all but the 22 first lines into a new file
        with open(os.path.join(path, name) + "_cut" + extention, "w") as newFile:

            newFile.writelines(lines[22:])