0

I have the following text file with repeated block of data. I want to change only the values A, B, C in each block of data and write the updated block of data in a file. How can I do this with a python structure, after loading the whole file into a string?

***   DATA
     1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   -122.20     -20.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     7         8         9        10        11        12
     3     53.21      10.2      90.0         0 0 1 0 0
   101         0         0         0         0         0         0          
    13        14        15        11        10        10
     .
     .
     .
    10         A         B         C         0 0 1 0 0
   110         0         0         0         0         0         0          
    20        21        22        23        24        25
TeXFun
  • 3
  • 4

4 Answers4

0

I think this code might do what you want.

import csv

with open('my_data.csv') as data_file,\
     open('values.csv') as value_file, \
     open('my_new_data.csv', 'wb') as out_file:

    data_reader = csv.reader(data_file, delimiter=' ', skipinitialspace=True)
    value_reader = csv.reader(value_file, delimiter=',')
    writer = csv.writer(out_file, delimiter=' ')
    while True:
        try:
            row = next(data_reader)
            row[1:4] = next(value_reader)
            writer.writerows([row, next(data_reader), next(data_reader)])
        except StopIteration:
            break

Provided that this is the input files:

my_data.csv

 1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   -122.20     -20.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     7         8         9        10        11        12
     3     53.21      10.2      90.0         0 0 1 0 0
   101         0         0         0         0         0         0          
    13        14        15        11        10        10

values.csv

1.0,2.5,3.2
4.1,5.2,6.2
7.6,8.0,9.3

Output

1 1.0 2.5 3.2 0 0 1 0 0
101 0 0 0 0 0 0 
1 2 3 4 5 6
2 4.1 5.2 6.2 0 0 1 0 0
101 0 0 0 0 0 0 
7 8 9 10 11 12
3 7.6 8.0 9.3 0 0 1 0 0
101 0 0 0 0 0 0 
13 14 15 11 10 10

Note that the leading and trailing spaces are gone.

Steinar Lima
  • 7,644
  • 2
  • 39
  • 40
  • Thanks Steinar! This is more close to the answer that I need, but you've misunderstood one thing. This data block is repeated "N" times and I want to change the values of A, B and C, in every block. Not only in one block! Of course the other values of the block will remain the same. – TeXFun Jan 14 '14 at 23:24
  • @TeXFun How do you know that a new block is starting/the current one has ended? – Steinar Lima Jan 14 '14 at 23:27
  • I don't know this info! Only that the block is 3 lines! So, the whole data are (3 lines of data) x (N times). In every 3-lined data block, I want to change only the values in positions of A, B, C respectively. – TeXFun Jan 14 '14 at 23:34
  • Yes! That's the answer to this problem! Thank you so much! Good job Steinar! – TeXFun Jan 15 '14 at 00:05
  • @TeXFun It seems to me that this solution doesn't keep the formatting of the lines. Is this OK for you ? – eyquem Jan 15 '14 at 00:19
  • I would prefer to keep the format (for readability reasons) but it really doesn't matter at all. @eyquem Do you have an answer that keep the same format of the lines? – TeXFun Jan 15 '14 at 00:21
  • Too make it more readable, you may want to use tab separated values. Just change `writer = csv.writer(out_file, delimiter=' ')` to `writer = csv.writer(out_file, delimiter='\t')` – Steinar Lima Jan 15 '14 at 00:24
  • Not difficult to do one. But I don't understand how the replacement of the values situated at A,B,C in a line are replaced with some values and not other. What is the rule to replace a given value ? – eyquem Jan 15 '14 at 00:28
0

You can do this with str.replace

with open('data.txt', 'r') as f:
    data = f.read()

A = str(30.4)
B = str(60000)
C = str(9)

data = data.replace('A'.rjust(len(A)), A) # eg, replace '   A' with '30.4'
data = data.replace('B'.rjust(len(B)), B)
data = data.replace('C'.rjust(len(C)), C)

with open('out.txt', 'w') as f:
    f.write(data)
    f.close()
Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
  • A big disadvantage with this solution, is that you have to read the entire file into memory. This is unnecessary and may cause problems if the file is big enough. – Steinar Lima Jan 14 '14 at 23:29
0

Is this what you're trying to do?

import re

data = """***   DATA
     1    253.31     78.20     490.0         0 0 1 0 0
   101         0         0         0         0         0         0          
     1         2         3         4         5         6
     .
     .
     .
     .
    10         A         B         C         0 0 1 0 0
   110         0         0         0         0         0         0          
    20        21        22        23        24        25"""
mldata = data.split('\n')

regex = re.compile(r'\b([A-Za-z])\b')
replacement = "test"

for line in mldata:
    newline = re.sub(regex,replacement,line)
    print newline

Link: ideone example

Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
  • in this snippet only A, B, C will be substituted with the string test. Not 253.31, 78.20, 490.0 as I want also! – TeXFun Jan 14 '14 at 23:53
0

If I understood correctly what you want, here's a code that keeps the format of the lines:

text = """DATA gfghsg hsghghsfghsfghshsdhf
     1    253.31     78.20     490.0         0 0 1 0 0
   101        .0         0         0         0         0         0          
     1         2         3         4         5         6
     2    123.31   122.20     -20.0         0  0  1  0  0
   201         0         0         0         0         0         0          
     7         8         9        10        11        12
     6         6         .        66       666      4 8 7 4 5 7
     3     53.21      10.2      90.0e+15         0 0 1 0 0
   301         0         0         0         0         0         0          
    13        14        15        11        10        10
kjqbskjqskdkqsdbkjqsbd
   547      AFFO       457       6545   1 0 2 5 4
    10        44       138          -.017         0 0 1 0 0
   410         0         0         0         0         0         0          
    20        21        22        23        24        25
  8888      9999
   500       87E-458      12  .4
   1.2     4.E-56     
    12   45  """

.

import re,csv

pat = '^([ \t]*[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]+[-+]?(?:\d+\.?|\.?\d)[\deE+-]*)'\
      '([ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)*\n'\
      \
      '^[ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)+\n'\
      \
      '^[ \t]*(?:[-+]?(?:\d+\.?|\.?\d)[\deE+-]*[ \t]*)+)$'
r = re.compile(pat,re.MULTILINE) 

def modify(text,filepath,r = r):
    with open(filepath,'rb') as vava:
        VALUES = map(tuple,
                     csv.reader(vava, delimiter='\t', skipinitialspace=True))

    dic = {}
    def ripl(m,VALUES=VALUES,dic=dic):
        lens = tuple(len(x) for x in m.group(2,3,4))
        pat = dic.setdefault(lens,'%%%ds%%%ds%%%ds' % lens)
        return m.group(1) + pat % VALUES.pop(0) + m.group(5)

    return r.sub(ripl,text)

print modify(text,'values.csv')

result

DATA gfghsg hsghghsfghsfghshsdhf
     1    100000      0.01    101.01         0 0 1 0 0
   101        .0         0         0         0         0         0          
     1         2         3         4         5         6
     2         2     0.02     20022         0  0  1  0  0
   201         0         0         0         0         0         0          
     7         8         9        10        11        12
     6         6         .        66       666      4 8 7 4 5 7
     3      3303     0.033       3.03333         0 0 1 0 0
   301         0         0         0         0         0         0          
    13        14        15        11        10        10
kjqbskjqskdkqsdbkjqsbd
   547      AFFO       457       6545   1 0 2 5 4
    10       4.4      0.44            4.4         0 0 1 0 0
   410         0         0         0         0         0         0          
    20        21        22        23        24        25
  8888      9999
   500          5555 0.5555555e+55
   1.2     4.E-56     
    12   45

Teh part

lens = tuple(len(x) for x in m.group(2,3,4))
pat = dic.setdefault(lens,'%%%ds%%%ds%%%ds' % lens)

is a sophistication that takes account of the possibility that the format wouldn't be always the same for all the modified lines. So it examines the lengthes of the 4 first parts of a line containing the 4 first values: if these values are already known, the corresponding pattern is got from the dictionary dic, and if not the new pattern is created and put in the dictionary.

eyquem
  • 26,771
  • 7
  • 38
  • 46
  • This is also a working solution to my problem. But how can we change the regular expression in order to include also the "*** DATA" line in the string? I tried to add this line, but the result was something different than before! – TeXFun Jan 15 '14 at 19:06
  • Yes, it's working. But how I can change the desired values in each line with the values taken from a csv file as mentioned in the previous answers? – TeXFun Jan 16 '14 at 23:57
  • Awesome! Thanks again eyquem! But the problem now, as you mentioned, is that there is arbitrary text, coupled also with numbers, before and after the data blocks we want to change. I can't find the suitable regex pattern to match only these data blocks. – TeXFun Jan 17 '14 at 10:46
  • I worked on an improvement of the code, but I can't finish it for the moment. I will post it later. - However, I wonder what you mean by _"there is arbitrary text, coupled also with numbers, before **and after** the data blocks_". Do you mean that there are not only head lines before the groups of three lines containing only numbers, but also some erratic lines between the groups of 3 ? – eyquem Jan 17 '14 at 15:01
  • Exactly! There are lines mixed with strings and numbers that i don't want to change. – TeXFun Jan 17 '14 at 22:27
  • I also want to avoid these kind of lines to match the regex pattern. – TeXFun Jan 17 '14 at 22:33
  • I've tried also some modifications to your snippet but it seems to be difficult to avoid the modification of other data blocks that matching regex except the "interesting" data block. Waiting for your answer. Thank you in advance! Really appreciate it! – TeXFun Jan 18 '14 at 00:19
  • I tried to develop a code that would have warned when there would have been erratic lines between the groups of 3 lines of numbers. But it seems these erratic lines don't annoy you, so I edited to post a simple code that just detects groups of 3 and does the desired replacement – eyquem Jan 18 '14 at 00:32
  • Tried to run the modified version of your snippet but python interpreter tells me that `TypeError: not enough arguments for format string`. It fails on line `return m.group(1) + pat % VALUES.pop(0) + m.group(5)`. – TeXFun Jan 18 '14 at 12:19
  • So the problem is in ``pat % VALUES.pop(0)``, isn't it ? Didn't you think to any way to discover the source of the problem ? Are you the type of developper that stay with swinging arms in front of a bugged code ? You didn't have the slightest idea to put some ``print`` instructions here and there to follow what happens during execution ? – eyquem Jan 18 '14 at 14:23
  • 1
    Begin to print the value of ``VALUES``. I bet that you'll see that ``VALUES`` isn't a triplet. Why ? Go uphill, see what the ``csv.reader()`` receives as default arguments and examine the nature of the file in which are kept the values read into ``VALUES``. I bet there's a difference between your file's structure and mine. Act as a developper, not as a stackoverflow.com's user. – eyquem Jan 18 '14 at 14:23