0

Basically I want to copy comments from one file and add it to the another data.

The file 'data_with_comments.txt' can be obtained from pastebin: http://pastebin.com/Tixij2yG

And it looks like this:

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
      14.2000     0.300000  8.00000e-05     0.999920
      14.2000     0.301000  4.00000e-05     0.999960
      14.2000     0.302000  2.00000e-05     0.999980
      14.2000     0.303000  2.00000e-05     0.999980
      14.2000     0.304000  2.00000e-05     0.999980
      14.2000     0.305000  3.00000e-05     0.999970
      14.2000     0.306000  5.00000e-05     0.999950

Now, i have another datafile 'test.txt' which looks like this:

300.0 1.53345164121e-32
300.1 1.53345164121e-32
300.2 1.53345164121e-32
300.3 1.53345164121e-32
300.4 1.53345164121e-32
300.5 1.53345164121e-32

Required output:

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
300.0 1.53345164121e-32
300.1 1.53345164121e-32
300.2 1.53345164121e-32
300.3 1.53345164121e-32
300.4 1.53345164121e-32

One way to do this is:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author    : Bhishan Poudel
# Date      : Jun 18, 2016


# Imports
from __future__ import print_function
import fileinput


# read in comments from the file
infile = 'data_with_comments.txt'
comments = []
with open(infile, 'r') as fi:
    for line in fi.readlines():
        if line.startswith('#'):
            comments.append(line)

# reverse the list
comments = comments[::-1]
print(comments[0])
#==============================================================================


# preprepend a list to a file
filename = 'test.txt'

for i in range(len(comments)):
    with file(filename, 'r') as original: data = original.read()
    with file(filename, 'w') as modified: modified.write(comments[i] + data)

In this method we have to open the file many times and it is not efficient when the data file is very large.

Is there any better way of doing this?

Related links are following:
Appending a list to the top of Pandas DataFrame output
Prepend line to beginning of a file
Python f.write() at beginning of file?
How can I add a new line of text at top of a file?
Prepend a line to an existing file in Python

Community
  • 1
  • 1
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • Those comments in the first file... are they all at the top or do you want all comments through the entire file? – tdelaney Jun 18 '16 at 17:18
  • @tdelaney I want only the comments (no data) from input1 and put those comments on top of input2 to create output (same or different from input2). – BhishanPoudel Jun 19 '16 at 19:01

4 Answers4

2

Especially if the data file (test.txt here) is large (as stated by the OP) I would suggest to (where the file is only opened once for read and another file for write):

  1. create a temp folder,
  2. prefill a temp file in there with the stripped(!) comment lines,
  3. add the lines from the data file,
  4. rename the temp file to the data file,
  5. remove the temp folder and voila.

Like so:

#! /usr/bin/env python
from __future__ import print_function

import os
import tempfile


infile = 'data_with_comments.txt'
comments = None
with open(infile, 'r') as f_i:
    comments = [t.strip() for t in f_i.readlines() if t.startswith('#')]

file_name = 'test.txt'
file_path = file_name  # simpl0ification here

tmp_dir = tempfile.mkdtemp()  # create tmp folder (works on all platforms)
tmp_file_name = '_' + file_name  # determine the file name in temp folder

s_umask = os.umask(0077)

tmp_file_path = os.path.join(tmp_dir, tmp_file_name)
try:
    with open(file_path, "rt") as f_prep, open(
            tmp_file_path, "wt") as f_tmp:
        f_tmp.write('\n'.join(comments) + '\n')
        for line in f_prep.readlines():
            f_tmp.write(line)
except IOError as e:
    print(e)  # or what you want to tell abnout it, instead of aborting
else:
    os.rename(tmp_file_path, file_path)
finally:
    try:  # so we have an empty folder in - nearly - any case
        os.remove(tmp_file_path)
    except OSError:
        pass
    os.umask(s_umask)
    os.rmdir(tmp_dir)

Nothing fancy and the per line iteration might be ahem, well ..., one should measure if it is sufficient performance wise. In scenarios I had to write to the "top" of a file, that mostly worked "good nuff", or one used a shell like:

cat comments_only test.txt > foo && mv foo test.txt

PS: For boosting file read and write in the "append" phase, one should use matching blockwise reads and writes with blocksizes optimized for underlying system calls to have maximum performance (as this will be a one to one copy, there is no need for line wise iteration).

Dilettant
  • 3,267
  • 3
  • 29
  • 29
2

You already have a great answer using a temporary directory but it is also common to just create a temporary file in the same directory as the target file. On systems where tmp is a separate mount point, you avoid an additional copy of the data when renaming the temporary file. Notice that there is no intermediate list of comments which is significant if the comment list is large.

import os
import shutil

infile = 'data_with_comments.txt'
filename = 'test.txt'

tmpfile = filename + '.tmp'

try:
    # write wanted data to tempfile
    with open(tmpfile, 'w') as out_fp:
        # prepend comments from infle
        with open(infile) as in_fp:
            out_fp.writelines(filter(lambda l: l.startswith('#'), in_fp))
        # then add filename
        with open(filename) as in2_fp:
            shutil.copyfileobj(in2_fp, out_fp)
    # get rid of original data
    os.remove(filename)
    # replace with new data
    os.rename(tmpfile, filename)
finally:
    # cleanup on error
    if os.path.exists(tmpfile):
        os.remove(tmpfile)
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • ... and now there are two :-) ... upvoted and I like that this answer's intro notes the "mount point" dilemma for atomic moves. Thanks. – Dilettant Jun 18 '16 at 17:32
1

You could use the lazy opening of a file and then just process the first lines of the file until a non-comment is found if your files contain comments only at the start of them. After finding a line which starts without a '#' character you can just break from the loop and let python's with statement handle the file closing.

Community
  • 1
  • 1
Gábor Fekete
  • 1,343
  • 8
  • 16
  • how to do with lazy opening, any hints? – BhishanPoudel Jun 18 '16 at 16:32
  • I think the question is not in detecting **if** a file needs prepending of these "comment" lines, but how to do that even for large files ... cf. my answer for a IMO "classic" way of doing that, by using a temporary file like in shell: `cat comments_only test.txt >> /tmp/foo && mv /tmp/foo test.txt` – Dilettant Jun 18 '16 at 16:35
  • Oh sorry I misread the question, your data files are big and you want to prepend the comments to it, right? There is no method for that without reading the whole data file... – Gábor Fekete Jun 18 '16 at 16:41
  • [This](http://stackoverflow.com/a/8010133/6464041) answers your question about lazy reading. – Gábor Fekete Jun 18 '16 at 16:43
  • Also your method will re-read the original file multiple times. Convert this part: ` for i in range(len(comments)): with file(filename, 'r') as original: data = original.read() with file(filename, 'w') as modified: modified.write(comments[i] + data) ` to this: ` with file(filename, 'r') as original: data = original.read() for i in range(len(comments)): with file(filename, 'w') as modified: modified.write(comments[i] + data)` – Gábor Fekete Jun 18 '16 at 16:56
1

Following the idea of Dilletant,

For multiple texts and only one comment file we can do this using shell script:

# in the directory i have one file called   : comment
# and, other many files with file_extension : .txt

for file in *.txt; do cat comments "$file" > foo && mv foo "$file"; done

This will write the same comments to all of the files(.txt) in the directory.

BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169