0

Disclaimer:

unlike 99.9% of most out there, I didn't pick up python until very late in the progression of languages I write in. I won't harp on some of the odd behaviors of the import model, but I do find myself having an issue understanding why the type checking (ie: "what kinda thing is you random object some user has given me hmm?) is all over the place.

Really this is just checking what class of data a thing is, but in python it's never struck me as being straightforward and in my research on the interwebz, well let's just say their are opinions and the only thing anyone agrees on is using the term pythonic. My question boils down to type(x) == y vs isinstance(x, y) when the type isn't one of the more straightforward list, tuple, float, int, ... yadda yadda .


Current Conundrum:

I need the ability to determine if an object that is being passed(either directly, or dynamically within a recursive routine) is not just an iterable, but more specifically an object created by scandir. Please don't get lost in the singular issue, i'll show i have many ways to get to this, but the bigger question is:
A) Is the method I'm using to coerce the output of type() going to bite me in the backside given a case I am not thinking of?
B) Am I missing a simpler way of accessing the 'class|type' of an object that is language-specific type of thing?
C) TBD

I'll start by showing maybe where the root of my disconnect comes from, and have a little fun with the people I know will take the time to answer this question properly by a first example in R.

I'm going to cast my own class attribute just to show what i'm talking about:

> a <- 1:3
> class(a)
[1] "integer"
> attr(a, "class")
[1] "integer"

Ok so, like in python, we can ask if this is an int(eger) etc. Now I can re-class as I see fit, which is getting to the point of where i'm going with the python issue:

> class(a) <- "i.can.reclass.how.i.want"
> class(a)
[1] "i.can.reclass.how.i.want"
> attr(a, "class")
[1] "i.can.reclass.how.i.want"

So now in python, let's say I have a data.frame, or as you all put it DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame({"a":[1,2,3]})
>>> type(df)
pandas.core.frame.DataFrame

Ok, so if i want to determine if my object is a DataFrame:

>>> df = pd.DataFrame({"a":[1,2,3]})
# Get the mro of type(df)? and remove 'object' as an item in the mro tuple
>>> isinstance(df, type(df).__mro__[:-1])
True
# hmmmm
>>> isinstance(df, (pandas.core.frame.DataFrame))
NameError: name 'pandas' is not defined
# hmmm.. aight let's try..
>>> isinstance(df, (pd.core.frame.DataFrame))
True
# Lulz... alright then, I guess i get that, but why did __mro__ pass with pandas vs pd? Not the point...

For when you can't do that

# yes..i know.. 3.5+ os.scandir... focus on bigger picture of this question/issue
import scandir

>>> a = scandir.scandir("/home")

>>> type(a)
posix.ScandirIterator

>>> str(type(scandir.scandir("/home")))
"<class 'scandir.ScandirIterator'>"

>>> isinstance(scandir.scandir("/home"), (scandir,scandir.ScandirIterator))
AttributeError: module 'scandir' has no attribute 'ScandirIterator'

# Okay fair enough.. kinda thought it could work like pandas, maybe can but I can't find it?

Question:

Does that mean that my only way of knowing the instance/type of certain objects like the scandir example are essentially the below type hacks?

import re
def isinstance_from_type(x, class_info):
    _chunk = re.search("(?<=\s['|\"]).*?(?=['|\"])", str(type(x)),re.DOTALL)
    try:
        return _chunk.group(0) == str(class_info)
    except:
        return False

>>> a = scandir.scandir("/home")
>>> type(a) == "scandir.ScandirIterator"
False

>>> isinstance_from_type(a, "scandir.ScandirIterator")
True

Okay I get why i don't get a string back from calling type etc, but please let me know if there's a better, more universal and consistent method i simply don't know, or the hot and dangerous things that are coming using a regex; trust me.. i get it.

Thanks for reading and any/all feedback about the mechanics of this specific to python are welcomed.

Carl Boneri
  • 2,632
  • 1
  • 13
  • 15
  • 2
    It is extremely straightforward. The way is to **use `isinstance` or `type` but you have to actually provide the type**. You did `isinstance(df, (pandas.core.frame.DataFrame))` but `pandas` **is not defined**... why did you *think* it was defined? You had *just used `pd.DataFrame` to *create your dataframe using the constructor*, use `isinstance(df, pd.DataFrame)` – juanpa.arrivillaga Nov 21 '21 at 19:39
  • i think you're missing the question, or... i didn't make it clear in mine – Carl Boneri Nov 21 '21 at 19:41
  • 1
    What is it that isn't clear? – juanpa.arrivillaga Nov 21 '21 at 19:44
  • maybe a better question is, whats a straightforward, and universal method of accessing and validating the "type" of an object. I feel like python is a bit convoluted (why i outlined with another language) when it comes to this, OR I am missing a very simple concept under the hood and am hoping someone points that out and it clicks. – Carl Boneri Nov 21 '21 at 19:47
  • What do you mean? What isn't straightforward about `pd.DataFrame`? Note, I have no idea what library you are using, for me, if I install scandir, and use `type(a)` i get generator, not a `posix.ScandirIterator` – juanpa.arrivillaga Nov 21 '21 at 19:50
  • Right, so going with that; whats the next _inexpensive_ step to validating something is a generator as you would if it were an instance of `list` or `str` etc – Carl Boneri Nov 21 '21 at 19:51
  • 1
    Is your question actually https://stackoverflow.com/questions/6416538/how-to-check-if-an-object-is-a-generator-object-in-python ? – Thierry Lathuille Nov 21 '21 at 19:53
  • 1
    Generally, you don't do this. The generator type can etiher be retrieved using something like `generator = type(i for i in range(1))` or `import types` and use `types.GeneratorType` – juanpa.arrivillaga Nov 21 '21 at 19:53
  • @ThierryLathuille that helps, I can go through the source code to get a better grasp of what's going on there. juanpa.arrivillaga appreciate the input. Thanks, all. – Carl Boneri Nov 21 '21 at 19:55
  • 2
    @CarlBoneri the source code? No! If you check the `types` module, you'll find `GeneratorType = type(_g())` where `_g` is just a generator defined like `def _g(): yield 1`. Not all the types used by a library will necessarily be exposed. Sometimes, you have to dig in to a library. Or, you can just create an object of that type and use `type`. – juanpa.arrivillaga Nov 21 '21 at 19:56
  • @juanpa.arrivillaga cool! – Carl Boneri Nov 21 '21 at 19:57
  • And don't ever use your `isinstance_from_type` – juanpa.arrivillaga Nov 21 '21 at 19:58

0 Answers0