0

I have a DataFrame with character strings of upper and lower case values and I need to extract only the lower case values between strings of 3 upper case values.

I'm using python and pandas to do this but have been unsuccessful. This is what the data looks like:

afklajrwouoivWERvalueineedREWkfjdsl
ALollz
  • 57,915
  • 7
  • 66
  • 89
sullymon54
  • 47
  • 4
  • I think you forgot to include the code you wrote that doesn't produce the correct output. – dfundako Aug 19 '19 at 18:57
  • Have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a [mcve] including sample input and output, and code for what you've tried so far – G. Anderson Aug 19 '19 at 19:02

2 Answers2

2

Let's try this:

df = pd.DataFrame({'text':['afklajrwouoivWERvalueineedREWkfjdsl']}, index=[0])

df['text'].str.extract('[A-Z]{3}(.+?)[A-Z]{3}')

Output:

valueineed

Note, this gets all characters between 3 uppercased letters.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

You can also use the re package with the same regex :

import re

re.search('[A-Z]{3}(.+?)[A-Z]{3}', s).group()[3:-3]

Output :

valueineed

If there are several occurences you should instead use :

matches = re.finditer('[A-Z]{3}(.+?)[A-Z]{3}',s)
results = [match.group(1) for match in matches]
vlemaistre
  • 3,301
  • 13
  • 30