1

I am trying to make a script which finds everything between a symbol {} in a text document. It takes the .txt documents specific part in the {} and organizes it alphabetically, then writing it inplace back to the text document. Example of text document..

bla bla bla 
bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
bla bla bla 
bla bla bla 

Desired output(sorted alphabetically)..

bla bla bla 
bla ba bl bla ba bl {airplane:hobby, apple:banana, this: something else}
bla bla bla 
bla bla bla 

What its still printing..

    bla bla bla 
    bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
    bla bla bla 
    bla bla bla 

My code..

def openFind():
    f = open(inFile, 'r')
    lines = f.read()
    match = re.findall(r'{(.*?)}', lines)
    before = str(match)
    n=1
    for i in xrange(0, len(match), n):
        mydict =  match[i:i+n]
        for x in sorted(mydict):
            c = x.split(',')
            newmatch = sorted(c)
            final =  str(newmatch)
            print final

            # NOT WORKING BELOW!!! Stuck in loop?
            with open(outFile,'w') as new_file:
                with open(inFile) as old_file:
                    for line in old_file:
                        new_file.write(line.replace(before, after))

It prints the sorted/alphabetical list as [airplane:hobby, apple:banana, this: something else], but how do I get it to replace the original text in the text document? Has to be inplace, but can make a new txt.

Anekdotin
  • 1,531
  • 4
  • 21
  • 43
  • 2
    Are you looking for something like [this](http://stackoverflow.com/questions/4719438/editing-specific-line-in-text-file-in-python) – idjaw Apr 11 '16 at 15:45
  • Ive tried the fileinput function and it drastically changes my text – Anekdotin Apr 11 '16 at 15:49

4 Answers4

2

This should work:

import re

def openFind():
    with open("test.txt", "r") as in_file:
        data = in_file.read()

    def sub(m):
        l = [s.strip() for s in m.group(1).split(",")]
        l.sort()
        return "{%s}" % (", ".join(l),)

    replacement = re.sub(r'{(.*?)}', sub, data)
    with open("out.txt", "w") as out_file:
        out_file.write(replacement)

I have used re.sub() in order to replace with the sorted match in-place.

Bharel
  • 23,672
  • 5
  • 40
  • 80
1

Following code will sort items between { & } and write the result to same file:

import re

with open('test.txt', 'r+') as f:
    s = f.read()
    r = list(s)
    for mo in re.finditer('{(.*?)}', s):
        d = sorted(mo.group(1).split(', '))
        r[mo.start(1):mo.end(1)] = list(', '.join(d))

    f.seek(0)
    f.write(''.join(r))
niemmi
  • 17,113
  • 7
  • 35
  • 42
1

I would approach this problem in pieces. First, you want to be able to read from one file and write to a new file. You could do this a multitude of ways. If your file is small you can just use readlines(), truncate your original file, and then write it back out.

But I'm going to assume the possibility of huge files (i.e. larger than will easily fit in RAM/swap space. Currently several GB in size).

import os
import tempfile

with tempfile.NamedTemporaryFile(delete=False) as temp:
    with open(filename) as infile:
        for line in infile:
            temp.write(line)
    os.unlink(infile)
    os.rename(temp.name, infile.name)

Now we're reading each line and writing it out to the destination. Now all you need to do is intercept the line and change it up if that's necessary:

 for line in infile:
     match = re.search('{{.*?}}')
     if match:
          # Assumes you only have one "dictionary" per line
          first_part, rest = line.split('{', maxsplit=1)
          # allows for trailing data
          data, last_part = rest.split('}', maxsplit=1)
          data = [_.split(':') for _ in data.split(',')]
          data.sort()
          line = '{}{{{}}}{}'.format(first_part, ', '.join(':'.join(_) for _ in data))
     temp.write(line)

You might have to tweak with the exact algorithm, but that's the approach that I would take when confronted with a problem like this.

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
1

The entire program can be written succinctly as follows,

with open("file.txt") as fr:
    content = fr.read()

matches = (match.group(1) for match in re.finditer(r"{(.*?)}", content))
for match in matches:
    repl = ", ".join(sorted(match.split(", ")))
    content = content.replace(match, repl)

with open("file.txt", "w") as f:
    fw.write(content)
C Panda
  • 3,297
  • 2
  • 11
  • 11