1

If I have a Python string, how can I use re.findall to extract only the items that start with a capital letter?

My string looks like this:

my_string = ['a17b', 'Cupcake', '8ikl3', 'Dinosaur']

I want my extracted string to look like this:

new_string = ['Cupcake', 'Dinosaur']

Here's my code so far (not correct):

import re
new_string = re.findall(r'[^A-Z]', my_string)

Where am I going wrong? Thank you.

  • Problem #1: `re.findall` takes a single `str`, you're passing it a `list` of `str`. – ShadowRanger Jun 21 '22 at 23:39
  • "If I have a Python string, how can I use re.findall to extract only the items that start with a capital letter?" You already know how to do this: by using the code that you tried to use. The problem is that **you don't** "have a Python string". You have a **list of** Python strings. This question is not really about regular expressions at all; it is about how to process a list. – Karl Knechtel Jun 30 '22 at 22:02

4 Answers4

2

You don't need re: just use str.isupper

[i for i in my_string if i[:1].isupper()]

Output:

['Cupcake', 'Dinosaur']
Chris
  • 29,127
  • 3
  • 28
  • 51
  • 2
    `str.istitle` requires *both* that it start with a capital letter *and* that all letters in a word following the capital letter be lowercase. That's the case for the OP's example, but it's slightly different from what they're describing. They may just want `if i[:1].isupper()` (the slice to `[:1]` means it silently discards the empty string instead of dying with an exception). – ShadowRanger Jun 21 '22 at 23:40
  • @ShadowRanger Good point. Let me make edit – Chris Jun 21 '22 at 23:42
  • Wait, `istitle` is a thing? COOL – Yaakov Bressler Jun 21 '22 at 23:46
  • 1
    @YaakovBressler: Yep. There's also `.capitalize` and `.title` to convert strings to that casing style (`capitalize` operates on the string as a whole, `title` on each word in the string). – ShadowRanger Jun 22 '22 at 00:29
1

That's not a string. That's a list of strings. You'll need something like:

newlst = [k for k in my_string if 'A' <= k[0] <= 'Z']
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
0
import re

my_string = ['a17b', 'Cupcake', '8ikl3', 'Dinosaur']
pattern = re.compile(r'^[A-Z]')
new_string = [i for i in my_string if pattern.match(i)] 
print(new_string)

Output:

['Cupcake', 'Dinosaur']

Wang Liang
  • 4,244
  • 6
  • 22
  • 45
0

Why it doesn't work:

  • findall() searches a string, but you passed in a list of strings
  • The expression [^A-Z] will return characters that are not in the range A-Z
  • You want to copy the entire word, but you're only asking for the first letter

How to fix:

#join the list of strings to get yourself a string that will work with findall()

my_string = ['a17b', 'Cupcake', '8ikl3', 'Dinosaur']
my_string = ' '.join(my_string)

new_string = ['Cupcake', 'Dinosaur']

expression = r'[A-Z][\w]*'
#now re will find all words with capitalized first letters
new_string = re.findall(expression, my_string)

new_string


Decoding Expression ('[A-Z]\w*')

  • [A-Z] finds Capitalized letters
  • \w finds word characters
  • \* tells re to look for 0 or more of the preceding term

One resource worth checking out is https://regex101.com

DonCarleone
  • 544
  • 11
  • 20