0

I get plain text strings from a software (no html) which contains css sequences like this:

Some plain text... 

    a {
        text-decoration: none;
    }

        a:hover {
            text-decoration: underline;
        }

    .afooter {
        color: #fff !important;
    }

Some other plain text... 

How can I remove these sequences? To me it looks like css, however, it is not surrounded by css script tag. The result I'm expecting is:

Some plain text... 


Some other plain text... 

I tried to create a html like string by adding <html><body>MY PLAIN TEXT</body></html> and checked if I can remove it using the HtmlAgilityPack. However, the sequences are not parsed as css node since they are part of the plain text (not surrounded by any tags) for some reason.

Michael
  • 3,982
  • 4
  • 30
  • 46
  • 2
    Have you even tried *anything*? We´re not here to do your work. – MakePeaceGreatAgain Dec 12 '17 at 14:27
  • and when you try anything, don't [try regex](https://stackoverflow.com/a/1732454/1132334). – Cee McSharpface Dec 12 '17 at 14:28
  • I tried to create a html like string by adding MY PLAIN TEXT and checked if i can remove it using the HtmlAgilityPack. However, the sequences are not parsed as css node. – Michael Dec 12 '17 at 14:30
  • 1
    [`[^{\n]*{\s*[^}]*}`](https://regex101.com/r/LQ6IAs/2) Does this work? – Gurmanjot Singh Dec 12 '17 at 14:31
  • @dlatikay Any reason why I should not use a regex? Can you recommend any frameworks/libraries I could use to solve it? – Michael Dec 12 '17 at 14:33
  • The solution depends on other content of your plain text. The solution depends on other content of your plain text. Are there `{` or `}` symbols in your plain text excluding region of interest? – Alexander Dec 12 '17 at 14:34
  • @Michael just joking. that approach with the agility pack could be promising. it may never be 100% exact. what if you fed it the – Cee McSharpface Dec 12 '17 at 14:34
  • Not sure how I could fed the tags. I'm receiving the plain text from some other blackbox software, so I'm not able to change it :-/ – Michael Dec 12 '17 at 14:39
  • Instead of extracting those parts of your inout that are invalid, why not concentrate on the *actual* content? Try to parse everything line by line and remove everything that doesn´t fit your expected style. Is there any common format on which you can rely for the other text? – MakePeaceGreatAgain Dec 12 '17 at 14:40
  • wrap everything in , then have a sufficiently tolerant parser try to interpret that as CSS and everything it can't, is your plain text residue. – Cee McSharpface Dec 12 '17 at 14:40
  • @dlatikay i can try this by tomorrow – Michael Dec 12 '17 at 14:45
  • @Gurman your regex seems to work for now, I will need to test it with some more data, so I will get back to you to give you feedback. Thank you! Maybe you can post it as an answer instead of a comment. – Michael Dec 12 '17 at 14:45
  • @HimBromBeere I've updated my question since you've probably downvoted it right away. To answer your question: I have no common format I can rely on, the texts are extracted from email live data by the blackbox software I'm using. – Michael Dec 12 '17 at 15:00
  • @Michael Sure. Let me know if you come across the case where it doesn't work. – Gurmanjot Singh Dec 12 '17 at 15:00

0 Answers0