-2

I have a string that's very long, notes for interactions with clients. I'm trying to clean it up for some text mining. Our employees put the dates into their notes with this format:

mm/d

and it looks like this in the string:

10/8 spoke with client about marketing campaign,
8/20 client requested some changes to 100 campaigns

How would I go about removing all of these dates from the string without removing all numbers? Ideally, my output would look like this:

spoke with client about marketing campaign, 
client requested some changes to 100 campaigns
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
ILikeWhiskey
  • 551
  • 5
  • 12
  • 1
    Are the dates always at the start of each line? – Daniel Roseman Oct 16 '18 at 16:19
  • It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem individually. A good way to show this effort is to include a [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve). Check the [intro tour](https://stackoverflow.com/tour) you finished before posting, especially [How to Ask](http://stackoverflow.com/help/how-to-ask). – Prune Oct 16 '18 at 16:21
  • You can use [regular expressions](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)- try searching for the following pattern in your text `"\d{1,2}/\d{1,2}"` – pault Oct 16 '18 at 16:22
  • You can use `sed 's#^\d*/\d* ##'`, for example. – Håken Lid Oct 16 '18 at 16:23

2 Answers2

2

You can use the re python module and work with regular expressions. It has the re.sub() function that will look for a regular expression in a string and replace all appearences of it with another string you provide.

import re
s = '10/8 spoke with client about marketing campaign'
re.sub(r'\d{1,2}/\d{1,2}', '', s)

That code will replace any occurrances of a string containing one or two decimal numbers, a forward slash and, again, one or two decimal numbers with the empty string in s.

You could also use

re.sub(r'(0?\d|1[0-2])/([1-2]\d|3[01]|0?\d)', '', s)

That one would not match something like '05/35' or '20/05', which don't make sense.

Pablo Paglilla
  • 366
  • 2
  • 5
  • 1
    I'd add a `.strip()` at the end, but otherwise [this is how I'd do it](https://stackoverflow.com/questions/52839865/how-do-i-remove-a-date-pattern-from-a-string#comment92595842_52839865) as well. – pault Oct 16 '18 at 17:08
0

If the strings always follow this format you can use str.partition()

s = '10/8 spoke with client about marketing campaign,'
s = s.partition(' ')[-1]
print(s)
# spoke with client about marketing campaign,
vash_the_stampede
  • 4,590
  • 1
  • 8
  • 20