0

I'm trying to split a large text files into smaller text files by using a word delimiter. I tried searching but I've only seen posts to break apart files after x lines. I'm fairly new to programming but I've given it a start. I want to go through all the lines, and if it starts with hello, it will put all of those lines into one file until it reaches the next hello. The first word in the file is hello. Ultimately, I'm trying to get the text into R, but I think it would be easier if I split it up like this first. Any help is appreciated, thanks.

text_file = open("myfile.txt","r")
lines = text_file.readlines()
print len(lines)
for line in lines :
    print line
    if line[0:5] == "hello":
Saul Alarcon
  • 13
  • 1
  • 4

2 Answers2

0

If you are finding for a very simple logic, Try this.

text_file = open("myfile.txt","r")
lines = text_file.readlines()
print len(lines)
target = open ("filename.txt", 'a') ## a will append, w will over-write
hello1Found = False
hello2Found = False

for line in lines :
    if hello1Found == True :  
        if line[0:5] == "hello":
            hello2Found = True
            hello1Found = False
            break ## When second hello is found looping/saving to file is stopped 
              ##(though using break is not a good practice here it suffice your simple requirement
        else: 
            print line #write the line to new file
            target.write(line)
    if hello1Found == False:
        if line[0:5] == "hello": ##find first occurrence of hello 
            hello1Found = True
            print line 
            target.write(line)      ##if hello is found for the first time write the 
                                ##line/subsequent lines to new file till the occurrence of second hello
pro- learner
  • 123
  • 10
  • Also Close the opened files at the end of program. target.close();text_file.close(); – pro- learner Mar 10 '15 at 17:05
  • This worked! Thanks! I did however make a few small changes, most likely because I probably wasn't being clear enough in my question (sorry). I got rid of the break, and replaced it with: target = open("filename" + str(k) + ".txt". Where k was a integer that increases 1 after each loop. This way it splits into more than 2 files (I ended up having 6009 of them) – Saul Alarcon Mar 14 '15 at 01:14
0

I am new to Python. I just finished a Python for Geographic Information Systems class at Northeastern University. This is what I came up with.

import os
import sys
import arcpy

def files():
    n = 0
    while True:
        n += 1
        yield open('/output/dir/%d.txt' % n, 'w')

pattern = 'hello'
fs = files()
outfile = next(fs)
filename = r'C:\output\dir\filename.txt'

with open(filename) as infile:
    for line in infile:
        if pattern not in line:
            outfile.write(line)
        else:
            items = line.split(pattern)
            outfile.write
            (items[0])
            for item in items:
                outfile = next(fs)
                outfile.write(item)

filename.close();outfile.close();
MJ Wood
  • 1
  • 1