Given a library that allows this import:
from thislibrary import FooBar
Is there a way to figure out the casing of the characters in FooBar
?
Motivation: This is because users of thislibrary
usually misspell the object and does
from thislibrary import Foobar
,from thislibrary import foobar
or evenfrom thislibrary import fooBar
.
I've tried generating all possible cases of the object by doing something like, https://stackoverflow.com/a/11144539/610569:
from itertools import product
s = 'foobar'
list(map("".join, product(*zip(s.upper(), s.lower()))))
[out]:
['FOOBAR',
'FOOBAr',
'FOOBaR',
'FOOBar',
'FOObAR',
'FOObAr',
...
]
Then I've tried to find the name of the variable in string as such:
import importlib
from itertools import product
import thislibrary
def find_variable_case(s, max_tries):
var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
# Intuitively, any camel casing should minimize the no. of upper chars.
# From https://stackoverflow.com/a/58789587/610569
var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
for i, in tqdm(enumerate(var_permutations)):
if i > max_tries:
return
try:
dir(thislibrary).index(v)
return v
except:
continue
find_variable_case('foobar')
[out]:
'FooBar'
But to import this it's still kinda painful, since the user have to manually type in the following after using the find_variable_case()
function.
from thislibrary import FooBar
Is there a way to write a function that checks for the objects imports inside thislibrary
?
Such that when the user runs this:
from thislibrary import foobar
That raises a meaning error:
ModuleNotFoundError: Perhaps you are referring to this import?
>>> from thislibrary import Foobar
For context, this is often the case for model machine-learning models to be abbreviated with character casing that are not consistent, e.g.
- The name of the model on paper is
LLaMA
https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ and in code sometimes the developer names the object,Llama
- Sometimes the name of the model on paper is
BERT
and the developer names the objectBert
There seem to be some common convention to keep single caps titlecasing style variables but in any case uses should have more meaningful error message (whenever possible).
For now, I've tried:
import transformers
from itertools import product
import importlib
def find_variable_case(s, max_tries=1000):
var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
# Intuitively, any camel casing should minimize the no. of upper chars.
# From https://stackoverflow.com/a/58789587/610569
var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
for i, v in enumerate(var_permutations):
if i > max_tries:
return
try:
dir(transformers).index(v)
return v
except:
continue
v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]
Which outputs:
transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer
Letting the user know that the right casing for the variable is LlamaTokenizer
.
Repeating the question given all the context above,
Is there a way to write a function that checks for the objects imports inside thislibrary?
Such that when a user does:
from transformers import LLaMatokenizer
the error would show:
ModuleNotFoundError: Perhaps you are referring to this import?
>>> from transformers import LlamaTokenizer