change multiple lines in file python using regex

Question

I have the following text file with repeated block of data. I want to change only the values A, B, C in each block of data and write the updated block of data in a file. How can I do this with a python structure, after loading the whole file into a string?

***   DATA
     1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   -122.20     -20.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     7         8         9        10        11        12
     3     53.21      10.2      90.0         0 0 1 0 0
   101         0         0         0         0         0         0          
    13        14        15        11        10        10
     .
     .
     .
    10         A         B         C         0 0 1 0 0
   110         0         0         0         0         0         0          
    20        21        22        23        24        25

Just spaces between the data! Consistent? You mean formatted? Yes, it's formatted... if you want to know about this. — TeXFun, Jan 14 '14 at 23:08
is five values... but i don't care about them. I will copy them to the new line as they are. I only care about A, B, C to change.... — TeXFun, Jan 14 '14 at 23:11
Are A, B, C literally those characters, or are there normally numbers there? — Peter Gibson, Jan 14 '14 at 23:12
To me this looks space delimited, in which case you're not going to be able to accomplish it with a regular expression. You'd be better to split on [character position](http://stackoverflow.com/questions/7111068/split-string-by-count-of-characters). — brandonscript, Jan 14 '14 at 23:13
I reconstruct the block of data with 3 blocks + 1 block to be more clear. — TeXFun, Jan 14 '14 at 23:48
@SteinarLima Suppose that the values for A, B and C comes from 3 lists from a csv file corresponding columns. — TeXFun, Jan 14 '14 at 23:50
Do you want to extract the block, do the replacement, and write the modified block in another file ? or would you liek to change the block in-place in the same file ? — eyquem, Jan 14 '14 at 23:59
Is the file big or not ? Can it be read entirely in the RAM ? — eyquem, Jan 15 '14 at 00:01
@eyquem I want to create a new file with the same structure! Only these 3 values in the first line of each block will be different in new file. — TeXFun, Jan 15 '14 at 00:02
Are the values ``253.31 78.20 490.0`` and ``123.31 -122.20 -20.0`` and ``53.21 10.2 90.0`` values A,B,C ? — eyquem, Jan 15 '14 at 00:04

Steinar Lima · Accepted Answer · 2014-01-15T00:04:03.880

0

I think this code might do what you want.

import csv

with open('my_data.csv') as data_file,\
     open('values.csv') as value_file, \
     open('my_new_data.csv', 'wb') as out_file:

    data_reader = csv.reader(data_file, delimiter=' ', skipinitialspace=True)
    value_reader = csv.reader(value_file, delimiter=',')
    writer = csv.writer(out_file, delimiter=' ')
    while True:
        try:
            row = next(data_reader)
            row[1:4] = next(value_reader)
            writer.writerows([row, next(data_reader), next(data_reader)])
        except StopIteration:
            break

Provided that this is the input files:

my_data.csv

 1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   -122.20     -20.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     7         8         9        10        11        12
     3     53.21      10.2      90.0         0 0 1 0 0
   101         0         0         0         0         0         0          
    13        14        15        11        10        10

values.csv

1.0,2.5,3.2
4.1,5.2,6.2
7.6,8.0,9.3

Output

1 1.0 2.5 3.2 0 0 1 0 0
101 0 0 0 0 0 0 
1 2 3 4 5 6
2 4.1 5.2 6.2 0 0 1 0 0
101 0 0 0 0 0 0 
7 8 9 10 11 12
3 7.6 8.0 9.3 0 0 1 0 0
101 0 0 0 0 0 0 
13 14 15 11 10 10

Note that the leading and trailing spaces are gone.

edited Jan 15 '14 at 00:04

answered Jan 14 '14 at 23:18

Steinar Lima

7,644
2
39
40

Thanks Steinar! This is more close to the answer that I need, but you've misunderstood one thing. This data block is repeated "N" times and I want to change the values of A, B and C, in every block. Not only in one block! Of course the other values of the block will remain the same. – TeXFun Jan 14 '14 at 23:24
@TeXFun How do you know that a new block is starting/the current one has ended? – Steinar Lima Jan 14 '14 at 23:27
I don't know this info! Only that the block is 3 lines! So, the whole data are (3 lines of data) x (N times). In every 3-lined data block, I want to change only the values in positions of A, B, C respectively. – TeXFun Jan 14 '14 at 23:34
Yes! That's the answer to this problem! Thank you so much! Good job Steinar! – TeXFun Jan 15 '14 at 00:05
@TeXFun It seems to me that this solution doesn't keep the formatting of the lines. Is this OK for you ? – eyquem Jan 15 '14 at 00:19
I would prefer to keep the format (for readability reasons) but it really doesn't matter at all. @eyquem Do you have an answer that keep the same format of the lines? – TeXFun Jan 15 '14 at 00:21
Too make it more readable, you may want to use tab separated values. Just change `writer = csv.writer(out_file, delimiter=' ')` to `writer = csv.writer(out_file, delimiter='\t')` – Steinar Lima Jan 15 '14 at 00:24
Not difficult to do one. But I don't understand how the replacement of the values situated at A,B,C in a line are replaced with some values and not other. What is the rule to replace a given value ? – eyquem Jan 15 '14 at 00:28

score 0 · Answer 2 · answered Jan 14 '14 at 23:19

0

You can do this with str.replace

with open('data.txt', 'r') as f:
    data = f.read()

A = str(30.4)
B = str(60000)
C = str(9)

data = data.replace('A'.rjust(len(A)), A) # eg, replace '   A' with '30.4'
data = data.replace('B'.rjust(len(B)), B)
data = data.replace('C'.rjust(len(C)), C)

with open('out.txt', 'w') as f:
    f.write(data)
    f.close()

answered Jan 14 '14 at 23:19

Peter Gibson

19,086
7
60
64

A big disadvantage with this solution, is that you have to read the entire file into memory. This is unnecessary and may cause problems if the file is big enough. – Steinar Lima Jan 14 '14 at 23:29

score 0 · Answer 3 · answered Jan 14 '14 at 23:36

Is this what you're trying to do?

import re

data = """***   DATA
     1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     .
     .
     .
     .
    10         A         B         C         0 0 1 0 0
   110         0         0         0         0         0         0          
    20        21        22        23        24        25"""
mldata = data.split('\n')

regex = re.compile(r'\b([A-Za-z])\b')
replacement = "test"

for line in mldata:
    newline = re.sub(regex,replacement,line)
    print newline

Link: ideone example

in this snippet only A, B, C will be substituted with the string test. Not 253.31, 78.20, 490.0 as I want also! — TeXFun, Jan 14 '14 at 23:53

eyquem · Answer 4 · 2014-01-18T00:29:58.173

If I understood correctly what you want, here's a code that keeps the format of the lines:

text = """DATA gfghsg hsghghsfghsfghshsdhf
     1    253.31     78.20     490.0         0 0 1 0 0
   101        .0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   122.20     -20.0         0  0  1  0  0
   201         0         0         0         0         0         0          
     7         8         9        10        11        12
     6         6         .        66       666      4 8 7 4 5 7
     3     53.21      10.2      90.0e+15         0 0 1 0 0
   301         0         0         0         0         0         0          
    13        14        15        11        10        10
kjqbskjqskdkqsdbkjqsbd
   547      AFFO       457       6545   1 0 2 5 4
    10        44       138          -.017         0 0 1 0 0
   410         0         0         0         0         0         0          
    20        21        22        23        24        25
  8888      9999
   500       87E-458      12  .4
   1.2     4.E-56     
    12   45  """

.

import re,csv

pat = '^([ \t]*[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)*\n'\
      \
      '^[ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)+\n'\
      \
      '^[ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)+)$'
r = re.compile(pat,re.MULTILINE) 

def modify(text,filepath,r = r):
    with open(filepath,'rb') as vava:
        VALUES = map(tuple,
                     csv.reader(vava, delimiter='\t', skipinitialspace=True))

    dic = {}
    def ripl(m,VALUES=VALUES,dic=dic):
        lens = tuple(len(x) for x in m.group(2,3,4))
        pat = dic.setdefault(lens,'%%%ds%%%ds%%%ds' % lens)
        return m.group(1) + pat % VALUES.pop(0) + m.group(5)

    return r.sub(ripl,text)

print modify(text,'values.csv')

result

DATA gfghsg hsghghsfghsfghshsdhf
     1    100000      0.01    101.01         0 0 1 0 0
   101        .0         0         0         0         0         0          
     1         2         3         4         5         6
     2         2     0.02     20022         0  0  1  0  0
   201         0         0         0         0         0         0          
     7         8         9        10        11        12
     6         6         .        66       666      4 8 7 4 5 7
     3      3303     0.033       3.03333         0 0 1 0 0
   301         0         0         0         0         0         0          
    13        14        15        11        10        10
kjqbskjqskdkqsdbkjqsbd
   547      AFFO       457       6545   1 0 2 5 4
    10       4.4      0.44            4.4         0 0 1 0 0
   410         0         0         0         0         0         0          
    20        21        22        23        24        25
  8888      9999
   500          5555 0.5555555e+55
   1.2     4.E-56     
    12   45

Teh part

lens = tuple(len(x) for x in m.group(2,3,4))
pat = dic.setdefault(lens,'%%%ds%%%ds%%%ds' % lens)

is a sophistication that takes account of the possibility that the format wouldn't be always the same for all the modified lines. So it examines the lengthes of the 4 first parts of a line containing the 4 first values: if these values are already known, the corresponding pattern is got from the dictionary dic, and if not the new pattern is created and put in the dictionary.

This is also a working solution to my problem. But how can we change the regular expression in order to include also the "*** DATA" line in the string? I tried to add this line, but the result was something different than before! — TeXFun, Jan 15 '14 at 19:06
Yes, it's working. But how I can change the desired values in each line with the values taken from a csv file as mentioned in the previous answers? — TeXFun, Jan 16 '14 at 23:57
Awesome! Thanks again eyquem! But the problem now, as you mentioned, is that there is arbitrary text, coupled also with numbers, before and after the data blocks we want to change. I can't find the suitable regex pattern to match only these data blocks. — TeXFun, Jan 17 '14 at 10:46
I worked on an improvement of the code, but I can't finish it for the moment. I will post it later. - However, I wonder what you mean by _"there is arbitrary text, coupled also with numbers, before **and after** the data blocks_". Do you mean that there are not only head lines before the groups of three lines containing only numbers, but also some erratic lines between the groups of 3 ? — eyquem, Jan 17 '14 at 15:01
Exactly! There are lines mixed with strings and numbers that i don't want to change. — TeXFun, Jan 17 '14 at 22:27
I also want to avoid these kind of lines to match the regex pattern. — TeXFun, Jan 17 '14 at 22:33
I've tried also some modifications to your snippet but it seems to be difficult to avoid the modification of other data blocks that matching regex except the "interesting" data block. Waiting for your answer. Thank you in advance! Really appreciate it! — TeXFun, Jan 18 '14 at 00:19
I tried to develop a code that would have warned when there would have been erratic lines between the groups of 3 lines of numbers. But it seems these erratic lines don't annoy you, so I edited to post a simple code that just detects groups of 3 and does the desired replacement — eyquem, Jan 18 '14 at 00:32
Tried to run the modified version of your snippet but python interpreter tells me that `TypeError: not enough arguments for format string`. It fails on line `return m.group(1) + pat % VALUES.pop(0) + m.group(5)`. — TeXFun, Jan 18 '14 at 12:19
So the problem is in ``pat % VALUES.pop(0)``, isn't it ? Didn't you think to any way to discover the source of the problem ? Are you the type of developper that stay with swinging arms in front of a bugged code ? You didn't have the slightest idea to put some ``print`` instructions here and there to follow what happens during execution ? — eyquem, Jan 18 '14 at 14:23
Begin to print the value of ``VALUES``. I bet that you'll see that ``VALUES`` isn't a triplet. Why ? Go uphill, see what the ``csv.reader()`` receives as default arguments and examine the nature of the file in which are kept the values read into ``VALUES``. I bet there's a difference between your file's structure and mine. Act as a developper, not as a stackoverflow.com's user. — eyquem, Jan 18 '14 at 14:23

change multiple lines in file python using regex

4 Answers4