3

I have a list of strings, containing chords represented as Latin Numerals like this:

['ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7', 'ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7']

and I want to split these strings into 3 sublists, like this:

numerals = ['ii', 'vi', 'V', 'IV', 'I', 'V', 'IV', 'ii', 'vi', 'V', 'IV', 'I', 'V', 'IV']
chord_type=['min', 'min', 'maj', 'maj', 'maj', 'maj','maj', 'min', 'min', 'maj', 'maj', 'maj', 'maj','maj']
extentions=['7','7','', 'add9','add9','','7','7','7','','add9','add9','','7']

(As you can see, the roman numerals in capital letters correspond to 'maj' in chord type and in non-capital letters to 'min'. )

All the possible roman numerals in my case:

i, ii, iii, iv, v, vi, vii, I, II, III, IV, V, VI, VII

Note that we don't need M, C, L, X .

I know I can extract or split numbers from letters in a string in Python, as described here, but how can I extract roman numerals?

I thought about using something like match regex, but I'm confused on how to define those 7 roman numerals as these characters might occur again in the string.

Littlish
  • 455
  • 2
  • 11
ZookKep
  • 481
  • 5
  • 13
  • 1
    You have added the `python` tag, but you have added no code to your question. – quamrana Jun 21 '21 at 13:45
  • Have you tried to use a regular expression that matches roman numerals instead of `\d+`? (note that you do not need to match all possible roman numerals, only those from 1 to 7) – mkrieger1 Jun 21 '21 at 13:45
  • One thing you should start with is to get a list of all the `numerals`, perhaps just in lower case. You seem to be missing some. You need `i` .. `vii` for the seven notes of a scale. – quamrana Jun 21 '21 at 13:46
  • what makes the "i" in `'min'` different than the "i" in `'vi7'`? In other words, is what you are trying to do *mathematically* feasible? – Ma0 Jun 21 '21 at 13:47
  • 1
    Can you enumerate all of the possible a) numerals, b) chord_types and c) extensions? – Scott Hunter Jun 21 '21 at 13:49
  • @Ma0: All the chords will start with `i` .. `vii` and `min` is optional. (Or so I thought). – quamrana Jun 21 '21 at 13:50
  • What about the ending? Is it always there and is it always a single digit? – tevemadar Jun 21 '21 at 13:52
  • 2
    Share your code, what you did so far? – Nur Jun 21 '21 at 13:53
  • Aren't lower case major, or uppercase minor/diminshed invalid? – Peter Wood Jun 21 '21 at 14:04

4 Answers4

5

If roman numeral is always first then you might do

import re
chords = ['ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7', 'ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7']
numerals = [re.match('[IiVv]+', i).group(0) for i in chords]
print(numerals)

output

['ii', 'vi', 'V', 'IV', 'I', 'V', 'IV', 'ii', 'vi', 'V', 'IV', 'I', 'V', 'IV']

Note that I used re.match as it does Try to apply the pattern at the start of the string and limited digits to existing in your example (rather than using all known i.e. IiVvXxLlCcDdMm).

Daweo
  • 31,313
  • 3
  • 12
  • 25
2

You can quite simply do this with the .startswith string method like so:

nummerals = ['i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'I', 'II', 'III', 'IV', 'V', 'VI', 'VII']
lst = ['ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7', 'ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7']


nummerals.sort(key=len, reverse=True)  # see note 1

res = [next(n for n in nummerals if y.startswith(n)) for y in lst]  # see note 2
print(res)  # -> ['ii', 'vi', 'V', 'IV', 'I', 'V', 'IV', 'ii', 'vi', 'V', 'IV', 'I', 'V', 'IV']

Notes

  1. The original nummerals list has to be sorted by length (descending order) to make sure you match the biggest possible nummeral (.startswith for 'ii7' would match both 'i' and 'ii' but you want the second one).
  2. The above code can throw a StopIteration error. If you want to prevent that, provide a fallback value to next.
Ma0
  • 15,057
  • 4
  • 35
  • 65
1

My solution uses a somewhat complex regular expression which offers two advantages:

  1. If the extension of a number looks like a part of an impossible roman number, for example IV and then I, a naive approach would consider the number IVI whereas my approach will only consider IV with I as the extension.
  2. If you need to extend your application with larger numbers, this will work for very large numbers.

Edit: Obviously for chords, it might be useless to have bigger numbers, but who knows? Maybe you will update the way music works

The regular expression I use comes from here. I have modified it a tiny bit to make it work here.

import re

l = ['ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7', 'ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7']

numerals = []
chord_type = []
extensions = []

roman_regex = '^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})'

for e in l:
    roman_search = re.search(roman_regex , e.upper())
    start = roman_search.start()
    end = roman_search.end()
    roman = e[start:end]

    numerals.append(roman)
    chord_type.append('maj' if roman[0].upper() == roman[0] else 'min')
    extensions.append(e[end:])
>>> print(numerals)
... print(chord_type)
... print(extensions)

['ii', 'vi', 'V', 'IV', 'I', 'V', 'IV', 'ii', 'vi', 'V', 'IV', 'I', 'V', 'IV']
['min', 'min', 'maj', 'maj', 'maj', 'maj', 'maj', 'min', 'min', 'maj', 'maj', 'maj', 'maj', 'maj']
['7', '7', '', 'add9', 'add9', '', 'maj7', '7', '7', '', 'add9', 'add9', '', 'maj7']
frogger
  • 41
  • 4
0

You can try this:

import re

matcher = re.compile(r'([IiVv]+)(min|maj|)(.*)')

def parse_string(s):
    gs = matcher.findall(s)[0]
    if gs[1] == '':
        gs = (gs[0], 'maj' if gs[0].isupper() else 'min', gs[2])
    return gs
    
def parse_array(A):
    return [parse_string(chord) for chord in A]
    
parsed = parse_array(['ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7', 'ii7', 'vi7', 'V', 'IVadd9', 'Iadd9', 'V', 'IVmaj7'])

numerals, chord_type, extensions = zip(*parsed)

print(list(numerals))
print(list(chord_type))
print(list(extensions))

I employed the use of re for regex parsing, surely that's no problem.

Captain Trojan
  • 2,800
  • 1
  • 11
  • 28