-1

I'm haven't find example that would fix my problem. I barely need help. What I'd like to do is to remove all .aplus-v2 and all inside curly braces. in short remove all css and html tags. I have tried using regex re.sub('(.aplus.*{.*})', '', tag.get_text()). But the problem is not all of it was remove.

Example string:

.aplus-v2 {\n      display:table;\n      margin-left:auto;\n      margin-right:auto;\n      word-wrap: break-word;\n      overflow-wrap: break-word;\n      word-break: break-word;\n    }\n    /* Undo this for tech-specs because it breaks table layout */\n    .aplus-v2 .aplus-tech-spec-table { word-break: initial; }\n    .aplus-v2 .aplus-module-wrapper {text-align:left; display:inline-block;}\n    .aplus-v2 .aplus-module-wrapper {text-align:inherit; display:inline-block;}\n    .aplus-v2 .aplus-module.module-1,\n    .aplus-v2 .aplus-module.module-2,\n    .aplus-v2 .aplus-module.module-3,\n    .aplus-v2 .aplus-module.module-4,\n    .aplus-v2 .aplus-module.module-6,\n    .aplus-v2 .aplus-module.module-7,\n    .aplus-v2 .aplus-module.module-8,\n    .aplus-v2 .aplus-module.module-9,\n    .aplus-v2 .aplus-module.module-10,\n    .aplus-v2 .aplus-module.module-11,\n    .aplus-v2 .aplus-module.module-12{padding-bottom:12px; margin-bottom:12px;}\n    .aplus-v2 .aplus-module:last-child{border-bottom:none}\n    .aplus-v2 .aplus-module {min-width:979px;}\n\n    /* aplus css needed to override aui on the detail page */\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered.a-vertical-stripes {border:none;}\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered.a-vertical-stripes td {background:none;}\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered.a-vertical-stripes td.selected {background-color:#ffd;}\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered.a-vertical-stripes td:first-child {background:#f7f7f7; font-weight:bold;}\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered.a-vertical-stripes tr th {background:none; border-right:none;}\n    .aplus-v2 .aplus-module table.aplus-chart.a-bordered tr td,.aplus-v2 .aplus-module table.aplus-chart.a-bordered tr th {border-bottom:1px dotted #999;}\n\n    /* A+ Template - General Module CSS */\n    .aplus-v2 .apm-top {height:100%; vertical-align:top;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-top {padding-left: 0px; padding-right: 3px}\n    .aplus-v2 .apm-center {height:100%; vertical-align:middle; text-align:center;}\n    .aplus-v2 .apm-row {width:100%; display:inline-block;}\n    .aplus-v2 .apm-wrap {width:100%;}\n    .aplus-v2 .apm-fixed-width {width:969px;}\n    .aplus-v2 .apm-spacing {float:left; zoom:1;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-spacing {float: right;}\n    .aplus-v2 .apm-floatleft {float:left;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-floatleft {float:right;}\n    .aplus-v2 .apm-floatright {float:right;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-floatright {float:left;}\n    .aplus-v2 .apm-floatnone {float:none;}\n    .aplus-v2 .apm-spacing img {border:none;}\n    .aplus-v2 .apm-leftimage {float:left; display:block; margin-right:20px; margin-bottom:10px;width: 300px;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-leftimage {float: right; margin-right: 0px; margin-left:20px;}\n    .aplus-v2 .apm-centerimage {text-align: center; width:300px; display:block; margin-bottom:10px;}\n    .aplus-v2 .apm-centerthirdcol {min-width:359px; display:block}\n    .aplus-v2 .apm-centerthirdcol ul,\n    .aplus-v2 .apm-centerthirdcol ol {margin-left: 334px;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-centerthirdcol ul,\n    html[dir=\'rtl\'] .aplus-v2 .apm-centerthirdcol ol {margin-left: 0px; margin-right: 334px;}\n    .aplus-v2 .apm-rightthirdcol {float:right; width:230px; padding-left:30px; margin-left:30px; border-left:1px solid #dddddd;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-rightthirdcol {float: left; padding-left: 0px; padding-right:30px; margin-left:0px; margin-right:30px; border-left:0px; border-right:1px solid #dddddd;}\n    .aplus-v2 .apm-lefttwothirdswrap {width:709px; display:block;}\n    .aplus-v2 .apm-lefthalfcol {width:480px; padding-right:30px; display:block; float:left;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-lefthalfcol {padding-left:30px; float:right;}\n    .aplus-v2 .apm-righthalfcol {width:480px; display:block; float:left;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-righthalfcol {float:right;}\n    .aplus-v2 .apm-eventhirdcol {width:300px; display:block;}\n    .aplus-v2 .apm-eventhirdcol-table {border-spacing: 0px 0px; border-collapse: collapse;}\n    .aplus-v2 .apm-eventhirdcol-table tr td {vertical-align: top;}\n    .aplus-v2 .apm-fourthcol {width:220px; float:left;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-fourthcol {float:right;}\n    .aplus-v2 .apm-fourthcol .apm-fourthcol-image {position:relative;}\n    .aplus-v2 .apm-fourthcol img {display:block; margin:0 auto;}\n    .aplus-v2 .apm-fourthcol-table {border-spacing: 0px 0px; border-collapse: collapse;}\n    .aplus-v2 .apm-fourthcol-table tr td {vertical-align: top;}\n    .aplus-v2 .apm-listbox {width:100%;}\n    .aplus-v2 .apm-iconheader {float:left; padding-left:10px;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-iconheader {float:right; padding-left:0px; padding-right: 10px}\n    .aplus-v2 .apm-spacing ul:last-child,.aplus-v2 ol:last-child {margin-bottom:0 !important;}\n    .aplus-v2 .apm-spacing ul,.aplus-v2 ol {padding:0 !important;}\n    .aplus-v2 .apm-spacing ul {margin:0 0 18px 18px !important; color:#aaaaaa;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-spacing ul {margin:0 18px 18px 0 !important;}\n\n    .aplus-v2 .apm-spacing ul li,.aplus-v2 ol li {word-wrap:break-word; margin:0 !important;}\n    .aplus-v2 .apm-spacing ul li {margin:0 !important;}\n    .aplus-v2 .apm-spacing ul .a-list-item,.aplus-v2 ol .a-list-item {color:#333333;}\n\n    /* A+ Template - Module 1 Sepcific CSS */\n    .aplus-v2 .amp-centerthirdcol-listbox {display:inline-block; width:359px;}\n\n    /* A+ Template - Module 2/3 Specific CSS */\n    .aplus-v2 .apm-sidemodule {text-align:left; margin:0 auto; width:970px; padding:0; background-color:#ffffff; position:relative;}\n    .aplus-v2 .apm-sidemodule {text-align:inherit;}\n    .aplus-v2 .apm-sidemodule-textright {width:470px; position:relative; display:table-cell; vertical-align:middle; padding-left:40px; height:300px; max-height:300px;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-sidemodule-textright {padding-left:0px; padding-right: 40px;}\n    .aplus-v2 .apm-sidemodule-textleft {width:630px; position:relative; display:table-cell; vertical-align:middle; padding-left:200px; height:300px; max-height:300px;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-sidemodule-textleft {padding-left:0px; padding-right:200px;}\n    .aplus-v2 .apm-sidemodule-imageleft {position:relative; float:left; display:block;}\n    html[dir=\'rtl\'] .aplus-v2 .apm-sidemodule-imageleft {float:right;}\n    .aplus-v2 .apm-sidemodule-imageright {position:relative; float:right; display:block;}\n    html[dir=\'rtl\'] In order to truly get on top of a flea problem, you MUST treat your  house with a suitable flea spray or powder,\n        \n\n\n\n\n\n\n        Vacuum floors and soft furnishings \n    \n\n            It is always advisable to vacuum your carpets and floors regularly. Flea pupae cannot be killed by insecticides, so vaccuming helps stimulate and encourage them to hatch into adult fleas, which are then  killed by the insecticide found in household flea treatments.\n

Hoping for a help or suggestion.

1 Answers1

1

Good afternoon,

First, all regex expression needs to be put in raw strings because they contains special characters, such as \. You can find more information about why regex need to be placed in raw string here.

the syntax for raw strings is:

r'foo bar'

or

r"foo bar"

Second, your regex in itself contains some characters that need to be escaped, like ., { and }. You can find the full list here: here To escape a character in a regex expression, you need to put a backslash \ before.

This is the corrected regex, based on what you said:

remove all .aplus-v2 and all inside curly braces

text_without_css = re.sub(r'\.aplus-v2 \{.*\}', '', text)

And here is the output using your example string:

\n html[dir=\'rtl\'] In order to truly get on top of a flea problem, you MUST treat your house with a suitable flea spray or powder,\n \n\n\n\n\n\n\n Vacuum floors and soft furnishings \n \n\n It is always advisable to vacuum your carpets and floors regularly. Flea pupae cannot be killed by insecticides, so vaccuming helps stimulate and encourage them to hatch into adult fleas, which are then killed by the insecticide found in household flea treatments.\n

As you see, you still need to do some work in order to get a fully readable string, but all CSS have been removed, as you asked.

Anonyme2000
  • 78
  • 1
  • 1
  • 9