Replace "*" (asterics) in HTML file with increasing number with python

Question

I have a HTML file that has a series of * (asterics) in it and would like to replace it with numbers starting from 0 and on until it replaces all * (asterics) with a number.

I am unsure if this is possible in python or if another methods would be better.

Edit 2

Here is a short snippet from the TXT file that I am working on

<td nowrap>4/29/2011 14.42</td>
<td align="center">*</td></tr>

I made a file just containing those lines to test out the code.

And here is the code that I am attempting to use to change the asterics:

number = 0
with open('index.txt', 'r+') as inf:
    text = inf.read()
while "*" in text:
    print "I am in the loop"
    text = text.replace("*", str(number), 1)
    number += 1

I think that is as much detail as I can go into. Please let me know if I should just add this edit as another comment or keep it as an edit. And thanks for all the quick responses so far~!

This is possible in Python. For more information, please show what you have tried and the problems you have. — bernie, Oct 05 '15 at 19:42
Please don't keep claiming that you used code from answers; this is confusing matters as it makes the answers look out of date, nor do you state why the specific solution you tried doesn't work. — Martijn Pieters, Oct 05 '15 at 21:18
Sorry for all the trouble i will make sure that the next time i will add comments/additional info properly. Also got it to finally work thanks for all the help — Nicky, Oct 06 '15 at 16:07

Martijn Pieters · Accepted Answer · 2015-10-06T17:02:07.443

1

Use the re.sub() function, this lets you produce a new value for each replacement by using a function for the repl argument:

from itertools import count

with open('index.txt', 'r') as inf:
    text = inf.read()

text = re.sub(r'\*', lambda m, c=count(): str(next(c)), text)

with open('index.txt', 'w') as outf:
    outf.write(text)

The count is taken care of by itertools.count(); each time you call next() on such an object the next value in the series is produced:

>>> import re
>>> from itertools import count
>>> sample = '''\
... foo*bar
... bar**foo
... *hello*world
... '''
>>> print(re.sub(r'\*', lambda m, c=count(): str(next(c)), sample))
foo0bar
bar12foo
3hello4world

Huapito's approach would work too, albeit slowly, provided you limit the number of replacements and actually store the result of the replacement:

with open('index.txt', 'r') as inf:
    text = inf.read()
while "*" in text:
    text = text.replace("*", str(number), 1)
    number += 1

Note the third argument to str.replace(); that tells the method to only replace the first instance of the character.

edited Oct 06 '15 at 17:02

answered Oct 05 '15 at 20:03

Martijn Pieters

1,048,767
296
4,058
3,343

That is not the OP's approach, it is the first answer edited into the question – Padraic Cunningham Oct 05 '15 at 20:21
@PadraicCunningham: ah, indeed; that was huapito's suggestion. – Martijn Pieters Oct 05 '15 at 20:25
Yep, the infinite loop approach! what is faster, reading the whole file and re.sub or iterating, including the replacing of the original data? – Padraic Cunningham Oct 05 '15 at 20:25
Using @huanpito approach would I need to close the file afterwards? I am just curious to know just in case I need it for a similar problem – Nicky Oct 06 '15 at 16:23
@Nicky: no, the `with` statement closes the file automatically. Note that my sample code doesn't write out the changed text again; I've left that out for brevity. – Martijn Pieters Oct 06 '15 at 17:01
@Martijn Pieters thanks for the clarification I am very new to python and haven't seen that method before so was curious. Also your fist block of code works after adding the **"import re"** into the code after the **"from itertools import count"**. It even worked on the original html file, all i needed to to was change the ".txt" to ".html". Once again thanks for all the help you are a lifesaver! – Nicky Oct 07 '15 at 20:39

RJGordon · Answer 2 · 2015-10-05T20:17:45.273

0

html = 'some string containing html'
new_html = list(html)

count = 0
for char in range(0, len(new_html)):
   if new_html[char] == '*':
       new_html[char] = count
       count += 1

new_html = ''.join(new_html)

This would replace each asteric with the numbers 1 to one less than the number of asterics, in order.

edited Oct 05 '15 at 20:17

answered Oct 05 '15 at 19:42

RJGordon

107
9

You'll need to include a `new_html = ''.join(new_html)` at the end to re-assemble the string. This is a *slow* approach, manually testing *each and every character*. – Martijn Pieters Oct 05 '15 at 20:10

Padraic Cunningham · Answer 3 · 2015-10-05T20:23:03.437

0

You need to iterate over each char, you can write to a tempfile and then replace the original with shutil.move using itertools.count to assign a number incrementally each time you find an asterix:

from tempfile import NamedTemporaryFile
from shutil import move
from itertools import count
cn = count()

with open("in.html") as f, NamedTemporaryFile("w+",dir="",delete=False) as out:
    out.writelines((ch if ch != "*" else str(next(cn)) 
                    for line in f for ch in line ))

move(out.name,"in.html")

using a test file with:

foo*bar
bar**foo
*hello*world

Will output:

foo1bar
bar23foo
4hello5world

edited Oct 05 '15 at 20:23

answered Oct 05 '15 at 19:49

Padraic Cunningham

176,452
29
245
321

@MartijnPieters, don't use windows so no idea, I imagine that is the exception more than the rule? Also I thought move does work on windows http://stackoverflow.com/a/8107391/2141635 where rename may not – Padraic Cunningham Oct 05 '15 at 20:23
@MartijnPieters, yep I looked at the source before, so what is the deal with that answer, I used it as a reference but I could not find a definitive answer anywhere. Is a try/except with rename removing the file the only option then? If so that seems pretty unpythonic – Padraic Cunningham Oct 05 '15 at 20:28
Ah, I see now what happens; in case of moving a file, the file is *copied* instead; so opening the target file, copy the contents, then delete the original. – Martijn Pieters Oct 05 '15 at 20:31
@MartijnPieters, there does not seem to be a pretty way to do it on windows.The tempfile approach is an atomic operation on unix so there is definitely an advantage – Padraic Cunningham Oct 05 '15 at 20:36

huapito · Answer 4 · 2015-10-05T21:09:31.530

-1

It is possible. Have a look at the docs. You should use something like a 'while' loop and 'replace' Example:

number=0 # the first number
while "*" in text: #repeats the following code until this is false
    text = text.replace("*", str(number), maxreplace=1) # replace with 'number'
    number+=1 #increase number

edited Oct 05 '15 at 21:09

answered Oct 05 '15 at 19:20

huapito

395
2
10

Should I get python to open then close the file as a HTML or save it as a TXT file then attempt to replace the asterics? – Nicky Oct 05 '15 at 19:31
`with open('my_html_file.html') as f: text = f.read()` – Flavio Ferrara Oct 05 '15 at 19:38
1

`"%d"%number` is a very verbose way of spelling `str(number)`. – Martijn Pieters Oct 05 '15 at 20:11
I agree with you @MartijnPieters, str(number) is better. Downvoter(s), would you pleas tell me what's wrong with my answer? :) Thanks – huapito Oct 05 '15 at 21:08
@huapito: I presume that the downvotes were for your first revision, that until Flavio Ferrara edited had.. issues. – Martijn Pieters Oct 05 '15 at 21:10

score -2 · Answer 5 · answered Oct 05 '15 at 19:30

-2

Use fileinput

import fileinput

with fileinput.FileInput(fileToSearch, inplace=True) as file:
number=0
for line in file:
    print(line.replace("*", str(number))
    number+=1

answered Oct 05 '15 at 19:30

bkaf

476
6
16

This will give all `*` characters on the same line the same number. – Martijn Pieters Oct 05 '15 at 20:09

Replace "*" (asterics) in HTML file with increasing number with python

5 Answers5