1

Given a library that allows this import:

from thislibrary import FooBar

Is there a way to figure out the casing of the characters in FooBar?

Motivation: This is because users of thislibrary usually misspell the object and does

  • from thislibrary import Foobar,
  • from thislibrary import foobar or even
  • from thislibrary import fooBar.

I've tried generating all possible cases of the object by doing something like, https://stackoverflow.com/a/11144539/610569:

from itertools import product

s = 'foobar'

list(map("".join, product(*zip(s.upper(), s.lower()))))

[out]:

['FOOBAR',
 'FOOBAr',
 'FOOBaR',
 'FOOBar',
 'FOObAR',
 'FOObAr',
 ...
]

Then I've tried to find the name of the variable in string as such:

import importlib
from itertools import product

import thislibrary

def find_variable_case(s, max_tries):
  var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
  # Intuitively, any camel casing should minimize the no. of upper chars.
  # From https://stackoverflow.com/a/58789587/610569
  var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
  for i,  in tqdm(enumerate(var_permutations)):
    if i > max_tries:
      return
    try:
      dir(thislibrary).index(v)
      return v
    except:
      continue

find_variable_case('foobar')

[out]:

'FooBar'

But to import this it's still kinda painful, since the user have to manually type in the following after using the find_variable_case() function.

from thislibrary import FooBar

Is there a way to write a function that checks for the objects imports inside thislibrary?

Such that when the user runs this:

from thislibrary import foobar

That raises a meaning error:

ModuleNotFoundError: Perhaps you are referring to this import?
   >>> from thislibrary import Foobar

For context, this is often the case for model machine-learning models to be abbreviated with character casing that are not consistent, e.g.

There seem to be some common convention to keep single caps titlecasing style variables but in any case uses should have more meaningful error message (whenever possible).

For now, I've tried:

import transformers

from itertools import product
import importlib

def find_variable_case(s, max_tries=1000):
  var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
  # Intuitively, any camel casing should minimize the no. of upper chars.
  # From https://stackoverflow.com/a/58789587/610569
  var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
  for i, v in enumerate(var_permutations):
    if i > max_tries:
      return
    try:
      dir(transformers).index(v)
      return v
    except:
      continue


v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]

Which outputs:

transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer

Letting the user know that the right casing for the variable is LlamaTokenizer.

Repeating the question given all the context above,

Is there a way to write a function that checks for the objects imports inside thislibrary?

Such that when a user does:

from transformers import LLaMatokenizer

the error would show:

ModuleNotFoundError: Perhaps you are referring to this import?
   >>> from transformers import LlamaTokenizer
alvas
  • 115,346
  • 109
  • 446
  • 738
  • 2
    Of note, Python 3.10 added similar "fuzzy search" functionality to suggest likely options on a `NameError` or `AttributeError`. It would be cool if it were eventually added as standard to `ImportError` as well, but I've not seen anything about it mentioned. – CrazyChucky Apr 02 '23 at 03:16

2 Answers2

-1

You can set up your own importer, few helpful links to get you started:

Pankaj Saini
  • 752
  • 6
  • 7
-1

There are better ways, sure. To do as you're describing I would use importlib and re

import re
from importlib import import_module

mod_name = "numpy"  # suppose you get the module name as a string

s = "Ones_Like"

module = import_module(mod_name)  # dynamically load a module by string name

contents = dir(module)  # list module contents

template = re.compile(s, re.IGNORECASE)  # case insensitive regex


for name in contents:
    # look for a regex match
    if template.fullmatch(name):
        print(f"{s} not found in module {mod_name}, did you mean '{name}'?")
        break
else:
    print("{s} not found in module {mod_name}")

You can further improve this with the inspect module to actually filter down module contents based on what they are (this method would find any objects, not just functions and classes).

You can improve the search by using re.search or re.match to come up with a list of objects which contain the bad import in their name. You can do even better with NLTK or some natural language package for some ML solution to check similar strings or by using edit distance, tokenizing, etc.

sturgemeister
  • 436
  • 3
  • 9