0

I have a HTML file that has a series of * (asterics) in it and would like to replace it with numbers starting from 0 and on until it replaces all * (asterics) with a number.

I am unsure if this is possible in python or if another methods would be better.

Edit 2

Here is a short snippet from the TXT file that I am working on

<td nowrap>4/29/2011 14.42</td>
<td align="center">*</td></tr>

I made a file just containing those lines to test out the code.

And here is the code that I am attempting to use to change the asterics:

number = 0
with open('index.txt', 'r+') as inf:
    text = inf.read()
while "*" in text:
    print "I am in the loop"
    text = text.replace("*", str(number), 1)
    number += 1

I think that is as much detail as I can go into. Please let me know if I should just add this edit as another comment or keep it as an edit. And thanks for all the quick responses so far~!

Nicky
  • 11
  • 3
  • 1
    This is possible in Python. For more information, please show what you have tried and the problems you have. – bernie Oct 05 '15 at 19:42
  • Please don't keep claiming that you used code from answers; this is confusing matters as it makes the answers look out of date, nor do you state why the specific solution you tried doesn't work. – Martijn Pieters Oct 05 '15 at 21:18
  • Sorry for all the trouble i will make sure that the next time i will add comments/additional info properly. Also got it to finally work thanks for all the help – Nicky Oct 06 '15 at 16:07

5 Answers5

1

Use the re.sub() function, this lets you produce a new value for each replacement by using a function for the repl argument:

from itertools import count

with open('index.txt', 'r') as inf:
    text = inf.read()

text = re.sub(r'\*', lambda m, c=count(): str(next(c)), text)

with open('index.txt', 'w') as outf:
    outf.write(text)

The count is taken care of by itertools.count(); each time you call next() on such an object the next value in the series is produced:

>>> import re
>>> from itertools import count
>>> sample = '''\
... foo*bar
... bar**foo
... *hello*world
... '''
>>> print(re.sub(r'\*', lambda m, c=count(): str(next(c)), sample))
foo0bar
bar12foo
3hello4world

Huapito's approach would work too, albeit slowly, provided you limit the number of replacements and actually store the result of the replacement:

with open('index.txt', 'r') as inf:
    text = inf.read()
while "*" in text:
    text = text.replace("*", str(number), 1)
    number += 1

Note the third argument to str.replace(); that tells the method to only replace the first instance of the character.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • That is not the OP's approach, it is the first answer edited into the question – Padraic Cunningham Oct 05 '15 at 20:21
  • @PadraicCunningham: ah, indeed; that was huapito's suggestion. – Martijn Pieters Oct 05 '15 at 20:25
  • Yep, the infinite loop approach! what is faster, reading the whole file and re.sub or iterating, including the replacing of the original data? – Padraic Cunningham Oct 05 '15 at 20:25
  • Using @huanpito approach would I need to close the file afterwards? I am just curious to know just in case I need it for a similar problem – Nicky Oct 06 '15 at 16:23
  • @Nicky: no, the `with` statement closes the file automatically. Note that my sample code doesn't write out the changed text again; I've left that out for brevity. – Martijn Pieters Oct 06 '15 at 17:01
  • @Martijn Pieters thanks for the clarification I am very new to python and haven't seen that method before so was curious. Also your fist block of code works after adding the **"import re"** into the code after the **"from itertools import count"**. It even worked on the original html file, all i needed to to was change the ".txt" to ".html". Once again thanks for all the help you are a lifesaver! – Nicky Oct 07 '15 at 20:39
0
html = 'some string containing html'
new_html = list(html)

count = 0
for char in range(0, len(new_html)):
   if new_html[char] == '*':
       new_html[char] = count
       count += 1

new_html = ''.join(new_html)

This would replace each asteric with the numbers 1 to one less than the number of asterics, in order.

RJGordon
  • 107
  • 9
  • You'll need to include a `new_html = ''.join(new_html)` at the end to re-assemble the string. This is a *slow* approach, manually testing *each and every character*. – Martijn Pieters Oct 05 '15 at 20:10
0

You need to iterate over each char, you can write to a tempfile and then replace the original with shutil.move using itertools.count to assign a number incrementally each time you find an asterix:

from tempfile import NamedTemporaryFile
from shutil import move
from itertools import count
cn = count()

with open("in.html") as f, NamedTemporaryFile("w+",dir="",delete=False) as out:
    out.writelines((ch if ch != "*" else str(next(cn)) 
                    for line in f for ch in line ))

move(out.name,"in.html")

using a test file with:

foo*bar
bar**foo
*hello*world

Will output:

foo1bar
bar23foo
4hello5world
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • @MartijnPieters, don't use windows so no idea, I imagine that is the exception more than the rule? Also I thought move does work on windows http://stackoverflow.com/a/8107391/2141635 where rename may not – Padraic Cunningham Oct 05 '15 at 20:23
  • @MartijnPieters, yep I looked at the source before, so what is the deal with that answer, I used it as a reference but I could not find a definitive answer anywhere. Is a try/except with rename removing the file the only option then? If so that seems pretty unpythonic – Padraic Cunningham Oct 05 '15 at 20:28
  • Ah, I see now what happens; in case of moving a file, the file is *copied* instead; so opening the target file, copy the contents, then delete the original. – Martijn Pieters Oct 05 '15 at 20:31
  • @MartijnPieters, there does not seem to be a pretty way to do it on windows.The tempfile approach is an atomic operation on unix so there is definitely an advantage – Padraic Cunningham Oct 05 '15 at 20:36
-1

It is possible. Have a look at the docs. You should use something like a 'while' loop and 'replace' Example:

number=0 # the first number
while "*" in text: #repeats the following code until this is false
    text = text.replace("*", str(number), maxreplace=1) # replace with 'number'
    number+=1 #increase number
huapito
  • 395
  • 2
  • 10
-2

Use fileinput

import fileinput

with fileinput.FileInput(fileToSearch, inplace=True) as file:
number=0
for line in file:
    print(line.replace("*", str(number))
    number+=1
bkaf
  • 476
  • 6
  • 16