Here's a flexible version that passes all your test cases, using regular expressions. First, I'll define and compile* the regular expressions:
import re
# This pattern checks for the numeric version of your input: one or more
# digits, followed by a period, and then one or more digits.
numeric_pattern = re.compile(r'\d+\.\d+$')
# This one looks for two optional groups: each is one or more digits
# followed by 'y' or 'years', or 'm' or 'months'. The capturing groups
# are named, so we can tell which is which even if we only find one.
word_pattern = re.compile(
# Written on two lines for clarity, but Python automatically
# combines string literals inside parentheses:
r'(?:(?P<years>\d+)y(?:ears)?)?'
r'(?:(?P<months>\d+)m(?:onths)?)?'
)
Then I define a function to check these patterns against a supplied string:
def get_year_month(string):
# Rather than deal with spaces and capitalization in our regexes,
# we can normalize the input string first.
string = string.lower().replace(' ', '')
# Check for the simpler case first. If it's a match, return as-is.
if numeric_pattern.match(string):
return string
# Otherwise, check for words. (This pattern will ALWAYS match,
# because each half is an optional group.)
match = word_pattern.match(string)
# Whatever it doesn't find is set to 0.
years = match.group('years') if match.group('years') else 0
months = match.group('months') if match.group('months') else 0
return f'{years}.{months}'
Looping over a list of your inputs and expected outputs is a simple way to verify if it's working. It doesn't throw an error, so we know all the tests pass.
tests = [
('4years3months5days', '4.3'),
('3 months 2 days', '0.3'),
('4 years 2 months', '4.2'),
('4 Years 3 Months', '4.3'),
('4.6', '4.6'),
('4Y3M', '4.3'),
]
for string, result in tests:
assert(get_year_month(string) == result)
*Even though Python caches regexes, I've found that, when your regexes will be reused multiple times, compiling them can still be dramatically faster, for some reason, even when the number of regexes isn't anywhere near maxing out the cache limit.
Regardless of performance, defining your regexes all in one place and giving them clear names can often make your code clearer and more readable.