12

I am attempting to use the fileinput module's inplace filtering feature to rewrite an input file in place.

Needed to set encoding (both for read and write) to latin-1 and attempted to pass openhook=fileinput.hook_encoded('latin-1') to fileinput.input but was thwarted by the error

ValueError: FileInput cannot use an opening hook in inplace mode

Upon closer inspection I see that the fileinput documentation clearly states this: You cannot use inplace and openhook together

How can I get around this?

iruvar
  • 22,736
  • 7
  • 53
  • 82

5 Answers5

7

As far as I know, there is no way around this with the fileinput module. You can accomplish the same task with a combination of the codecs module, os.rename(), and os.remove():

import os
import codecs

input_name = 'some_file.txt'
tmp_name = 'tmp.txt'

with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
     codecs.open(tmp_name, 'w', encoding='latin-1') as fo:

    for line in fi:
        new_line = do_processing(line) # do your line processing here
        fo.write(new_line)

os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name

You also have the option of specifying a new encoding for the output file if you want to change it, or leave it as latin-1 when opening the output file if you don't want it it to change.

I know this isn't the in-place modification you were looking for, but it will accomplish the task you were trying to do and is very flexible.

skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
  • I had to implement something similar recently because I couldn't get fileinput to handle encodings properly. Just wanted to mention that your call to `os.remove()` is redundant and creates a race condition (the file ceases to exist for a brief period). If you just call `os.rename()`, it will atomically replace the original file with the new file, and thus any other processes that may attempt to read that file won't ever error out due to the file missing. – robru Feb 22 '15 at 01:01
  • @robru I know this is waaay late, but I just stumbled on this answer again. Your comment isn't accurate. [The docs for os.rename](https://docs.python.org/2/library/os.html#os.rename) state that if the destination name exists, OSError will be raised (at least on Windows). So the original file does need to be removed before the rename. And I think the race condition risk is minimal. You're already modifying the file anyway, so I think it's safe to assume that other processes consuming the file have been stopped or you're risking them getting corrupted data anyway from the in-place modifications. – skrrgwasme Oct 13 '17 at 14:35
4

If you don't mind using a pip library, the in_place library supports encoding.

import in_place

with in_place.InPlace(filename, encoding="utf-8") as fp:
  for line in fp:
    fp.write(line)
Jason
  • 9,408
  • 5
  • 36
  • 36
2

Starting python 3.10 fileinput.input() accepts an encoding parameter

iruvar
  • 22,736
  • 7
  • 53
  • 82
1

This is very similar to the other answer, just done in function form so that it can be called multiple times with ease:

def inplace(orig_path, encoding='latin-1'):
    """Modify a file in-place, with a consistent encoding."""
    new_path = orig_path + '.modified'
    with codecs.open(orig_path, encoding=encoding) as orig:
        with codecs.open(new_path, 'w', encoding=encoding) as new:
            for line in orig:
                yield line, new
    os.rename(new_path, orig_path)

And this is what it looks like in action:

for line, new in inplace(path):
    line = do_processing(line)  # Use your imagination here.
    new.write(line)

This is valid both as python2 and python3 and Does The Right Thing with your data as long as you specify the correct encoding (in my case I actually needed utf-8 everywhere, but your needs obviously vary).

robru
  • 2,313
  • 2
  • 29
  • 38
  • 1
    I like it. Note that the two `with` statements can be [combined](http://stackoverflow.com/a/3025119/753731) – iruvar Feb 22 '15 at 01:28
1

I'm not crazy about the existing solutions using rename/remove, because they oversimplify some of the file handling that the inplace flag does - for example handling the file mode, handling a chmod attribute, etc.

In my case, because I control the environment that my code is going to run in, I decided the only reasonable solution was to set my locale to a UTF8-using locale:

export LC_ALL=en_US.UTF-8

The effect is:

sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
    line = self._readline()
  File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
    return self._readline()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'

sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done

sh-4.2# 

The potential side-effects are changes to other file input & output, but I'm not worried about that here.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • 1
    Interesting approach. I guess you could contain the impact of changing `LC_ALL` to just the `python` process in question by inlining it, witness `LC_ALL=en_US.latin-1 python -c 'import locale; print locale.getdefaultlocale()'` – iruvar May 09 '19 at 18:21