Python: Copying lines that meet requirements

Question

So, basically, I need a program that opens a .dat file, checks each line to see if it meets certain prerequisites, and if they do, copy them into a new csv file.

The prerequisites are that it must 1) contain "$W" or "$S" and 2) have the last value at the end of the line of the DAT say one of a long list of acceptable terms. (I can simply make-up a list of terms and hardcode them into a list)

For example, if the CSV was a list of purchase information and the last item was what was purchased, I only want to include fruit. In this case, the last item is an ID Tag, and I only want to accept a handful of ID Tags, but there is a list of about 5 acceptable tags. The Tags have very veriable length, however, but they are always the last item in the list (and always the 4th item on the list)

Let me give a better example, again with the fruit.

My original .DAT might be:

DGH$G$H $2.53 London_Port Gyro

DGH.$WFFT$Q5632 $33.54 55n39 Barkdust

UYKJ$S.52UE $23.57 22#3 Apple

WSIAJSM_33$4.FJ4 $223.4 Ha25%ek Banana

Only the line: "UYKJ$S $23.57 22#3 Apple" would be copied because only it has both 1) $W or $S (in this case a $S) and 2) The last item is a fruit. Once the .csv file is made, I am going to need to go back through it and replace all the spaces with commas, but that's not nearly as problematic for me as figuring out how to scan each line for requirements and only copy the ones that are wanted.

I am making a few programs all very similar to this one, that open .dat files, check each line to see if they meet requirements, and then decides to copy them to the new file or not. But sadly, I have no idea what I am doing. They are all similar enough that once I figure out how to make one, the rest will be easy, though.

EDIT: The .DAT files are a few thousand lines long, if that matters at all.

EDIT2: The some of my current code snippets

Right now, my current version is this:

def main():
    #NewFile_Loc = C:\Users\J18509\Documents
    OldFile_Loc=raw_input("Input File for MCLG:")
    OldFile = open(OldFile_Loc,"r")
    OldText = OldFile.read()
#   for i in range(0, len(OldText)):
#       if (OldText[i] != " "):
#           print OldText[i]
    i = split_line(OldText)
    if u'$S' in i:
        # $S is in the line
        print i
main()

But it's very choppy still. I'm just learning python.

Brief update: the server I am working on is down, and might be for the next few hours, but I have my new code, which has syntax errors in it, but here it is anyways. I'll update again once I get it working. Thanks a bunch everyone!

import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
    if os.path.isfile(NewFilePath):
        os.remove(NewFilePath)
    NewFile = open (NewFilePath, 'w')
    NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
    OldFile_Loc=raw_input("Input File for Program:")
    OldFile = open(OldFile_Loc,"r")
    for line in OldFile:
        LineParts = line.split()
        if (LineParts[0].find($W)) or (LineParts[0].find($S)):
            if LineParts[3] in Acceptable_Values:
                print(LineParts[1], ' is accepted')
                #This Line is acceptable!
                NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
    OldFile.close()
    NewFile.close()
main()

It sounds like you want someone to write the program for you. Have you tried anything yet? — FastTurtle, Jul 23 '13 at 21:40
Could you post one of programs in order to have a better look at your problem ? — Ketouem, Jul 23 '13 at 21:41
Well, so far all I have figured out is how to open the file and look at each character one at a time. I made a counter that counts how many times $S or $W happen, but I'm not sure how to even copy the specific line I am on into the new file. I have managed to copy the whole text into a new file... So basically, I have a few commands down, but I haven't gotten very far into it yet. — CamelopardalisRex, Jul 23 '13 at 21:44
You don't have to use `.read()`, you can use `.readline()`, `.readlines()`, or my favorite: `for line in file:`. — 2rs2ts, Jul 23 '13 at 21:54
Okay, I'll give you a couple of pointers to help you get started and have a first go at this. First of all go through the file one line at a time e.g. 'for line in file_object:' then you can split the line into a list of four parts 'parts = line.split()' - default separator for split is spaces, you can then do checks on if given strings are in the line or 'if parts[3] in static_list' - checking the 4th item in the parts list against your list of values. Have a go with this info and post your attempt above highlighting any specifics you are stuck on. — ChrisProsser, Jul 23 '13 at 21:56
Alright, @ChrisProsser, I'll post back tomorrow once I've tried that, and read a bit more into this. The way python work is strange to me, but just from reading your post, I think I may have figured out the rest of my problems. If not, I'll be back. — CamelopardalisRex, Jul 23 '13 at 22:00

score 1 · Answer 1 · edited May 23 '17 at 12:21

There are two parts you need to implement: First, read a file line by line and write lines meeting a specific criteria. This is done by

with open('file.dat') as f:
    for line in f:
        stripped = line.strip() # remove '\n' from the end of the line
        if test_line(stripped):
            print stripped # Write to stdout

The criteria you want to check for are implemented in the function test_line. To check for the occurrence of "$W" or "$S", you can simply use the in-Operator like

if not '$W' in line and not '$S' in line:
    return False
else:
    return True

To check, if the last item in the line is contained in a fixed list, first split the line using split(), then take the last item using the index notation [-1] (negative indices count from the end of a sequence) and then use the in operator again against your fixed list. This looks like

items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
    return True
else:
    return False

Now, you combine these two parts into the test_line function like

def test_line(line):
    if not '$W' in line and not '$S' in line:
        return False
    items = line.split() # items is an array of strings
    last_item = items[-1] # take the last element of the array
    if last_item in ['Apple', 'Banana']:
        return True
    else:
        return False

Note that the program writes the result to stdout, which you can easily redirect. If you want to write the output to a file, have a look at Correct way to write line to file in Python

score 1 · Answer 2 · answered Jul 23 '13 at 22:25

inlineRequirements = ['$W','$S']
endlineRequirements = ['Apple','Banana']

inputFile = open(input_filename,'rb')
outputFile = open(output_filename,'wb')
for line in inputFile.readlines():
    line = line.strip()
    #trailing and leading whitespace has been removed
    if any(req in line for req in inlineRequirements):
        #passed inline requirement
        lastWord = line.split(' ')[-1]
        if lastWord in endlineRequirements:
            #passed endline requirement
            outputFile.write(line.replace(' ',','))    
            #replaced spaces with commas and wrote to file
inputFile.close()
outputFile.close()

score 1 · Answer 3 · answered Jul 23 '13 at 22:26

tags = ['apple', 'banana']
match = ['$W', '$S']
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile.readlines(): # Loop through the file
    line = line.strip() # Remove the newline and whitespace
    if line and not line.isspace(): # If the line isn't empty
        lparts = line.split() # Split the line
        if any(tag.lower() == lparts[-1].lower() for tag in tags) and any(c in line for c in match):
            # $S or $W is in the line AND the last section is in tags(case insensitive)
            print line

score 0 · Answer 4 · answered Jul 23 '13 at 22:22

0

import re
list_of_fruits = ["Apple","Bannana",...]
with open('some.dat') as f:
    for line in f:
        if re.findall("\$[SW]",line) and line.split()[-1] in list_of_fruits:
           print "Found:%s" % line

answered Jul 23 '13 at 22:22

Joran Beasley

110,522
12
160
179

score 0 · Accepted Answer · answered Jul 26 '13 at 21:40

import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
    if os.path.isfile(NewFilePath):
        os.remove(NewFilePath)
    NewFile = open (NewFilePath, 'w')
    NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
    OldFile_Loc=raw_input("Input File for Program:")
    OldFile = open(OldFile_Loc,"r")
    for line in OldFile:
        LineParts = line.split()
        if (LineParts[0].find(\$W)) or (LineParts[0].find(\$S)):
            if LineParts[3] in Acceptable_Values:
                print(LineParts[1], ' is accepted')
                #This Line is acceptable!
                NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
    OldFile.close()
    NewFile.close()
main()

This worked great, and has all the capabilities I needed. The other answers are good, but none of them do 100% of what I needed like this one does.

Python: Copying lines that meet requirements

5 Answers5