-1

In the following input, I am trying to replace the numbers and \n with '' and ' ' respectively.

THE SONNETS\n\n                    1\n\nFrom fairest creatures we desire increase,\nThat thereby beauty’s rose might never die,\nBut as the riper should by time decease,\nHis

she hies,             1189\nAnd yokes her silver doves; by whose swift aid\nTheir mistress mounted through the empty skies,\nIn her light chariot quickly is convey’d;           1192\n  Holding their course to Paphos, where their queen\n  Means to immure herself and not be seen.\n'

The input_var is read from a file that has above content.

file_name = 'sample.txt'
file = open(folder+file_name, mode='r', encoding='utf8')
input_var = file.read()
file.close

The screenshot of file is attached. enter image description here

The data in file is

THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His

she hies,             1189
And yokes her silver doves; by whose swift aid
Their mistress mounted through the empty skies,
In her light chariot quickly is convey’d;           1192
  Holding their course to Paphos, where their queen
  Means to immure herself and not be seen.

For identifying numbers I have the used the regex [\s]{3,}\d{1,}\\n (there have to be at least 3 spaces before the number. (see this link for testing of regex).

I am using the following code to replace the regular expression and \n both that I have got from a few answers in stackoverflow.

Code 1 -

# Remove the numbers in sonnets and at the end of lines
pattern = {r'[\s]{3,}\d{1,}\\n' : '',
           r'\\n' : ' '
          }

regex = re.compile('|'.join(map(re.escape, pattern.keys(  ))))
output_var = regex.sub(lambda match: pattern[match.group(0)], input_var)

Code 2 -

rep = dict((re.escape(k), v) for k, v in pattern.items())
pattern_test = re.compile("|".join(rep.keys()))
output_var = pattern_test.sub(lambda m: rep[re.escape(m.group(0))], input_var)

Code 3 -

for i, j in pattern.items():
        output_var = input_var.replace(i, j)

where input_var has the above mentioned text. All three do not replace anything.

I have also tried

pattern = {r'[\s]{3,}\d{1,}\n' : '',
           r'\n' : ' '
          }

but it does not replace anything.

pattern = {'[\s]{3,}\d{1,}\n' : '',
           '\n' : ' '
          }

replaces only \n and the output is like

THE SONNETS                      1  From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His

The regular expression is not identified in the dictionary and it is, I think, being taken as literal string rather than regular expression. How can I specify the regular expression in the dictionary? The answers I have found in stackoverflow use strings rather than regular expression like the answer provided for this question.

The expected outcome is

THE SONNETS                       From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His

    she hies,And yokes her silver doves; by whose swift aid  Their mistress mounted through the empty skies, In her light chariot quickly is convey’d;  Holding their course to Paphos, where their queen   Means to immure herself and not be seen. ' 
Main
  • 150
  • 1
  • 10
  • Looks like (judging by your regex101 link at the top) you have been testing against string literals instead of literal strings. Please add the `input_var` declaration in the question. – Wiktor Stribiżew Mar 26 '20 at 15:30
  • What's your expected result. Can you please share to see clear picture! – Vin.AI Mar 26 '20 at 15:41
  • *All three do not replace anything* - because you most probably have not tested your regexps against the strings you have, but against string literals. – Wiktor Stribiżew Mar 26 '20 at 15:45
  • Your expected result and `pattern` your tried do not match. It confusing!!! – Vin.AI Mar 26 '20 at 16:12
  • Ok, try 1) `re.sub(r'(?m)\s{3,}\d+$|(\n)', lambda x: ' ' if x.group(1) else '', input_var)` or 2) `re.sub(r'\s{3,}\d+\n|(\n)', lambda x: ' ' if x.group(1) else '', input_var)` or 3) `re.sub(r'\s{3,}\d+\n|(\n+)', lambda x: ' ' if x.group(1) else '', input_var)`. Does any of these work the way you want? – Wiktor Stribiżew Mar 26 '20 at 16:17
  • @WiktorStribiżew What I am looking for - in [this answer](https://stackoverflow.com/a/15175239/5752535) is there a way to have regex like " **[Ll]** arry Wall" : "Guido van Rossum" in `dict`. Your answer works if I have to replace multiple regex with the same thing. In case of different replacements, I have to write multiple `re.sub`. – Main Mar 27 '20 at 06:14
  • 1
    I see, but there is no way to do it like you want. You need to run `re.sub`s in a loop: `for reg, repl in pattern.items(): output_var = re.sub(reg, repl, output_var)`, see https://ideone.com/NO2ciG – Wiktor Stribiżew Mar 27 '20 at 12:30

2 Answers2

0

Here's a bit of a working example that you could run (if you have bs4 etc.). I see you're getting help on the numbering and regex but this may help understand the line returns etc. (not exactly sure what the goal is). Couldn't find a source on the web with similar number to your source so it's not like-for-like unfortunately. Maybe food for thought if nothing else.

from bs4 import BeautifulSoup
import re
import requests


url = 'http://www.gutenberg.org/cache/epub/1041/pg1041.txt'

page = requests.get(url)
# print(page.status_code)
soup = BeautifulSoup(page.text)

sonnet = page.text

print(sonnet[780:1500])
print()
print('------')
print()
sonnet = re.sub('\r','',sonnet)
sonnet = re.sub('\n','',sonnet)
print(sonnet[698:1500])

url2 = 'http://shakespeare.mit.edu/Poetry/VenusAndAdonis.html'

page = requests.get(url2)
# print(page.status_code)
soup = BeautifulSoup(page.text)
print()
print('------')
print('------')
print()
VenusAndAdonis = soup.text
print(type(VenusAndAdonis))
print(VenusAndAdonis[800:1500])
print()
print('------')
print()
VenusAndAdonis = re.sub('\r','',VenusAndAdonis)
VenusAndAdonis = re.sub('\n',' ',VenusAndAdonis)
print(VenusAndAdonis[800:1500])

Outputs:

I

  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou, contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the grave and thee.

  II

  When forty winters shall besiege thy brow,

------

I  From fairest creatures we desire increase,  That thereby beauty's rose might never die,  But as the riper should by time decease,  His tender heir might bear his memory:  But thou, contracted to thine own bright eyes,  Feed'st thy light's flame with self-substantial fuel,  Making a famine where abundance lies,  Thy self thy foe, to thy sweet self too cruel:  Thou that art now the world's fresh ornament,  And only herald to the gaudy spring,  Within thine own bud buriest thy content,  And tender churl mak'st waste in niggarding:    Pity the world, or else this glutton be,    To eat the world's due, by the grave and thee.  II  When forty winters shall besiege thy brow,  And dig deep trenches in thy beauty's field,  Thy youth's proud livery so gazed on now,  Will be a tatter'd weed of small 

------
------

<class 'str'>
 honour to your heart's content; which I
wish may always answer your own wish and the world's hopeful
expectation.
Your honour's in all duty,
WILLIAM SHAKESPEARE.

EVEN as the sun with purple-colour'd face
Had ta'en his last leave of the weeping morn,
Rose-cheek'd Adonis hied him to the chase;
Hunting he loved, but love he laugh'd to scorn;
Sick-thoughted Venus makes amain unto him,
And like a bold-faced suitor 'gins to woo him.


'Thrice-fairer than myself,' thus she began,
'The field's chief flower, sweet above compare,
Stain to all nymphs, more lovely than a man,
More white and red than doves or roses are;
Nature that made thee, with herself at strife,
Saith that the world hath ending wit

------

 honour to your heart's content; which I wish may always answer your own wish and the world's hopeful expectation. Your honour's in all duty, WILLIAM SHAKESPEARE.  EVEN as the sun with purple-colour'd face Had ta'en his last leave of the weeping morn, Rose-cheek'd Adonis hied him to the chase; Hunting he loved, but love he laugh'd to scorn; Sick-thoughted Venus makes amain unto him, And like a bold-faced suitor 'gins to woo him.   'Thrice-fairer than myself,' thus she began, 'The field's chief flower, sweet above compare, Stain to all nymphs, more lovely than a man, More white and red than doves or roses are; Nature that made thee, with herself at strife, Saith that the world hath ending wit
MDR
  • 2,610
  • 1
  • 8
  • 18
  • Instead of writing multiple `re.sub`, I was looking for a solution where I can put all the pattern:replacement in a, say, dictionary and then run the `re.sub` once only. – Main Mar 27 '20 at 06:20
0

You need to run re.subs in a loop, but make sure the output_var is initialized the input_var value:

output_var = input_var
for reg, repl in pattern.items():
  output_var = re.sub(reg, repl, output_var)

See the Python demo online:

import re

input_var = """THE SONNETS

                    1

From fairest creatures we desire increase,
That thereby beauty’s rose might never die,
But as the riper should by time decease,
His

she hies,             1189
And yokes her silver doves; by whose swift aid
Their mistress mounted through the empty skies,
In her light chariot quickly is convey’d;           1192
  Holding their course to Paphos, where their queen
  Means to immure herself and not be seen."""

pattern = {r'\s{3,}\d+\n' : '',
           r'\n' : ' '}
output_var = input_var
for reg, repl in pattern.items():
  output_var = re.sub(reg, repl, output_var)

print(output_var)

Output:

THE SONNETS From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decease, His  she hies,And yokes her silver doves; by whose swift aid Their mistress mounted through the empty skies, In her light chariot quickly is convey’d;  Holding their course to Paphos, where their queen   Means to immure herself and not be seen.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563