0

My goal is to be able to define different sets of sub-strings that can be removed without eliminating the other strings. Open to better ideas.

What I have now is:

@outbound_text = " XTEST this is hidden XTEST hey there, what's up! XTEST this is also hidden XTEST but then I just keep writing here "

I tried the following but realized it was deleting hey there, what's up

if ENV['ENVIRONMENT'] == 'test'

  # this will allow the XTEST string to come through

else # for production and development, remove the XTEST

  unless @outbound_text.gsub!(/XTEST(.*)XTEST/  , '').nil?
    @outbound_text.strip!
  end

  logger.debug "remove XTEST: #{@outbound_text}"
end

Open to different strings bookending what I need to remove (but the number of hidden sub-strings will vary so they can only be a beginning and end).

I think open to -- although have a number of them which get parsed, so open to using Nokogiri to remove the hidden tags. I would need to spend some time to try that, but wanted to know if there were a simple gsub before trying it.

ndnenkov
  • 35,425
  • 9
  • 72
  • 104
Satchel
  • 16,414
  • 23
  • 106
  • 192
  • I'd like to point out to the mod who closed this question that the referenced question is not the same as the question asked here, and the answers to this specific version of the question will likely use different techniques than that answer. – Derrell Durrett Mar 10 '16 at 20:55
  • Nokogiri isn't an option if your data isn't XML or HTML. If the data IS one of those, then you should show that content and your code that extracts the string you want to from it. Then we could help you fine tune your code. As it stands your question sounds like an XY Problem, where you have the string and now need to clean it up, when perhaps it would have been easier to do prior to extracting it, while it's still XML or HTML. – the Tin Man Mar 10 '16 at 21:47
  • 1
    @DerrellDurrett agree with you. Hoping the non greedy gain could work. That being said found a post from theTinMan on how to use partial documents with XML to remote and that looks promising. – Satchel Mar 13 '16 at 06:15
  • I was going to offer my own answer elsewhere, but I'll suggest that the easy way is to split the string on the relevant key. With the right trim of whitespace, you can avoid having to redo that. It doesn't look kewl like a regex, but it's generally more efficient to equally efficient. – Derrell Durrett Mar 13 '16 at 19:19
  • 1
    @DerrellDurrett -- can you clarify what this means and how to do this? Is it the same as answer below on non-greey? – Satchel Apr 01 '16 at 00:03
  • See here: http://pastebin.com/pxYcgTuU (Has several examples for how one chooses the string you split on/how you rebuild it). – Derrell Durrett Apr 01 '16 at 14:57
  • @DerrellDurrett thanks useful! – Satchel Apr 05 '16 at 15:50

1 Answers1

1

Just make the repetition non-greedy:

@outbound_text.gsub(/XTEST(.*?)XTEST/  , '').strip
  # => "hey there, what's up!  but then I just keep writing here"
Community
  • 1
  • 1
ndnenkov
  • 35,425
  • 9
  • 72
  • 104