0

I have the following input file structure, with text on each line :

line1
line2
line3
line3
line4
line5
line6

When two lines are exactly the same i.e. line 3 I want to keep the second one and change the content of the first to be "SECTION MISSING". I do not manage to put it at the right place. The closest I get to is with the code below but the output I get is :

line1
line2
line3
SECTION MISSING
line4
etc.

While I want:

line1
line2
SECTION MISSING
line3 
line4

Code:

def uniq(iterator):
    previous = float("NaN")  # Not equal to anything
    section=("SECTION : MISSING\n")
    for value in iterator:
        if previous == value:
            yield section
        else:
            yield value
            previous = value
    return;

 with open('infile.txt','r') as file:
    with open('outfile.txt','w') as f:
        for line in uniq(file):
            f.write(line)
Zong
  • 6,160
  • 5
  • 32
  • 46
user1562471
  • 33
  • 1
  • 3
  • You can apply a sliding window iterator to solve this problem http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python – M4rtini May 13 '14 at 14:20

4 Answers4

5

I think you want to yield previous, rather than value:

def uniq(iterator):
    previous = None
    section = ("SECTION : MISSING\n")
    for value in iterator:
        if previous == value:
            yield section
        elif previous is not None:
            yield previous
        previous = value
    if previous is not None:
        yield previous

Example usage:

>>> list(uniq([1, 2, 2, 3, 4, 5, 6, 6]))
[1, 'SECTION : MISSING\n', 2, 3, 4, 5, 'SECTION : MISSING\n', 6]
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
2

Something like:

prev = None
with open('infile.txt','r') as fi:
    with open('outfile.txt','w') as fo:
        for line in fi:
            if prev is not None: 
                fo.write(prev if prev != line else "SECTION : MISSING\n")
            prev = line
        fo.write(prev)

Will give you the output file you're looking for:

line1
line2
SECTION : MISSING
line3
line4
line5
line6
jedwards
  • 29,432
  • 3
  • 65
  • 92
  • 1
    You should really check for `None` by identity - `if prev is not None:` (see [PEP-008](http://legacy.python.org/dev/peps/pep-0008/#programming-recommendations)) – jonrsharpe May 13 '14 at 14:46
  • @jonsharpe, I thought about that too, but then thought that the EOL would resolve this. But your comment made me reconsider (eg the last line in the file). Updating answer. – jedwards May 13 '14 at 14:48
0

Personal preference for tasks like these, I use two cursors instead of one:

from itertools import tee, izip
with open(infile) as r, open(outfile, 'w') as w:
    p, c = tee(r)
    w.write(next(c))
    for prev,cur in izip(p,c):
        w.write(cur if prev!=cur else 'SECTION : MISSING\n')
roippi
  • 25,533
  • 4
  • 48
  • 73
0

In case you ever have to handle the situation with three consecutive lines (well, two or more) where you only want to replace the first one, you could use groupby:

from itertools import groupby, islice, chain

def detect_missing(source):
    grouped = groupby(source)
    section = "SECTION: MISSING\n"
    for _, group in grouped:
        first_two = list(islice(group, 2))
        if len(first_two) > 1:
            first_two[0] = section
        yield from chain(first_two, group)

(Python 3, but you could remove the yield from if you wanted.)

DSM
  • 342,061
  • 65
  • 592
  • 494