1

I have a list of tuples, each on contains a word-to-be-replaced, its line and column number positions from a given text file. I want to go through the text file and replace that specific word of that specific position with a character (e.g. [('word1', 1, 1), ('word2', 1, 9), ... ]).

In other words, given a specific word, its line and column numbers inside a text file I am trying to find and replace that word with a character, for example:

given that the text file contains the following (assuming its position is as it is displayed -not written- here)

Excited him now natural saw passage offices you minuter. At by stack being court hopes. Farther so friends am to detract. Forbade concern do private be. Offending residence but men engrossed shy. Pretend am stack earnest arrived company so on. Felicity informed yet had to is admitted strictly how stack you.

and given that the word to replace is stack with position in the text to be line 3 and column 16, to replace it with the character *,

so, after the replace takes place, the text file would now have the contents:

Excited him now natural saw passage offices you minuter. At by stack being court hopes. Farther so friends am to detract. Forbade concern do private be. Offending residence but men engrossed shy. Pretend am * earnest arrived company so on. Felicity informed yet had to is admitted strictly how stack you.

I have considered linecache but it seems very inefficient for large text files. Also, given the fact that I already have the line and column numbers, I hoped there was a way to go directly to that position and perform the replace.

Does anyone know a way to do this in Python?

EDIT

The initial solution proposed using numpy's genfromtxt is (most likely) not suitable following the discussion in the follow-up issue since there is a need for every line of the text file to be present and not skipped (e.g. empty lines, strings beginning with 'w' and strings inside '/*.. /').

Community
  • 1
  • 1
  • Take a look at this [answer](http://stackoverflow.com/questions/2081836/reading-specific-lines-only-python/2081880#2081880) It can help you with reading specific lines but you'll have to traverse the whole file. – JRajan Apr 27 '16 at 16:48

2 Answers2

1

Try a recipe like this:

import numpy as np
import os

def changethis(pos):
    # Notice file is in global scope
    appex = file[pos[1]-1][:pos[2]] + '*' + file[pos[1]-1][pos[2]+len(pos[0]):]
    file[pos[1]-1] = appex

pos = ('stack', 3, 16)
file = np.array([i for i in open('in.txt','r')]) #BEFORE EDIT: np.genfromtxt('in.txt',dtype='str',delimiter=os.linesep)
changethis(pos)
print(file)

The result is this:

[ 'Excited him now natural saw passage offices you minuter. At by stack being court hopes. Farther'
 'so friends am to detract. Forbade concern do private be. Offending residence but men engrossed'
 'shy. Pretend am * earnest arrived company so on. Felicity informed yet had to is admitted'
 'strictly how stack you.']

Notice this is a bit of an hack to put a bunch of long strings into a numpy array, and somehow change them, but it should be efficient when inserting in a longer loop for position tuples.

EDIT: As @user2357112 made me realize the choice for file reader was not the most appropriate (although it worked for the exercise in question), so I've edited this answer to provide the same solution given in the follow up question.

Community
  • 1
  • 1
armatita
  • 12,825
  • 8
  • 48
  • 49
  • Struggling to understand what this does; could you please explain the `changethis` method? –  Apr 27 '16 at 17:36
  • 1
    @hask.duk That function is reading the position you give and break the string in two: everything before the word, and everything after the word (those indexations with positions are for that). Than it builds a new string join both those parts and with a ´*´ in the middle. After that the element is completely replaced in the numpy array. – armatita Apr 27 '16 at 19:46
  • If you are interested, please take a look in this [follow-up question](http://stackoverflow.com/questions/36924519/python-numpy-ndarray-skipping-lines-from-text) –  Apr 28 '16 at 20:16
  • @hask.duk Sorry for the use of genfromtxt. I just tried to give a solution based on the exercise you have provided in the question. In any case I've noticed the other question got a lot of attention. I'll do a follow up of this solution (somehow numpy based) in case you might be interested. – armatita Apr 29 '16 at 08:40
  • Thanks for looking at this again. I should have been more explicit in my description to begin with. –  Apr 29 '16 at 09:10
1

Consider a single line:

word1 a word2 a word3 a word4

If you have these changes:

[('word1', 1, 1), ('word2', 1, 9), ... ]

And you process them in order:

* a word2 a word3 a word4

You will fail, because you are changing the positions of the words when you replace 'word1' with '*', a shorter string.

Instead, you will have to sort the list of changes by line, reversed by column:

changes = sorted(changes, key=lambda t: (t[1], -t[2]))

You can then process the changes as you iterate through the file, shown in the link referenced by @JRajan:

with open("file", "r") as fp:
    fpline_text = enumerate(fp)
    fpline,text = next(fpline_text)

    for edit in changes:
        word,line,offset = edit
        line -=1  # 0 based

        while fpline < line:
            print(text)
            fpline,text = next(fpline_text)

        offset -= 1    # 0-based
        cand = text[offset:offset+len(word)]

        if cand != word:
            print("OOPS! Word '{}' not found at ({}, {})".format(*edit))
        else:
            text = text[0:offset]+'*'+text[offset+len(word):]

    # Rest of file
    try:
        while True:
            print(text)
            fpline,text = next(fpline_text)
    except StopIteration:
        pass
aghast
  • 14,785
  • 3
  • 24
  • 56
  • Having trouble implementing the sorting part: `changes = [('word1', 1, 1), ('word2', 1, 9), ('word2', 1, 12)]` and `changes = sorted(changes, key=lambda t: return (t[0], t[1], -t[2]))` gives me `return outside of function error`. Am I doing something wrong? –  Apr 30 '16 at 09:56
  • Suggested an edit to handle a code error and and modification to match the intended description. – Yannis Apr 30 '16 at 10:37
  • I modified the example. The lambda should just contain an expression, not a return: lambda t: (t[1],-t[2]) – aghast Apr 30 '16 at 17:01