1

I have a script like below

import pandas as pd

from typing import Any

def info(i: Any) -> None:
    print(f'{type(i)=:}')
    print(f'{i=:}')
    print(f'{i.Index=:}')
    print(f'{i.x=:}')
    print(f'{i.y=:}')

if __name__ == "__main__":
    df = pd.DataFrame([[1,'a'], [2, 'b']], columns=['x', 'y'])
    for i in df.itertuples():
        info(i)

Its output is

type(i)=<class 'pandas.core.frame.Pandas'>
i=Pandas(Index=0, x=1, y='a')
i.Index=0
i.x=1
i.y=a
type(i)=<class 'pandas.core.frame.Pandas'>
i=Pandas(Index=1, x=2, y='b')
i.Index=1
i.x=2
i.y=b

I'd like to avoid using Any, but what's the proper way to type i in info function? My goal is to make my aware that i has fields Index, x and y.

If I follow pandas' way of typing (I'm still using python3.8):

def info(i: Tuple[Any, ...]) -> None: 

mypy complains:

toy.py:8:11: error: "Tuple[Any, ...]" has no attribute "Index"; maybe "index"?  [attr-defined]
toy.py:9:11: error: "Tuple[Any, ...]" has no attribute "x"  [attr-defined]
toy.py:10:11: error: "Tuple[Any, ...]" has no attribute "y"  [attr-defined]
Found 3 errors in 1 file (checked 1 source file)
zyxue
  • 7,904
  • 5
  • 48
  • 74
  • If you know the column names in advance, you could define a namedtuple with identical fields, and [type it as that](https://stackoverflow.com/questions/34269772/type-hints-in-namedtuple). It's not strictly correct (Pandas is constructing its own namedtuple class, not using yours) but it will catch typos. – Nick ODell Jun 20 '23 at 21:24
  • 1
    What Pandas itself does is that it types it as as a tuple of Any. https://github.com/pandas-dev/pandas/blob/v2.0.2/pandas/core/frame.py#L1407 – Nick ODell Jun 20 '23 at 21:26
  • I doubt you'll be able to statically type this - the type is created dynamically at runtime *when you call `itertuples`*. I don't think there has been any way of integrating pandas column names into type hints – juanpa.arrivillaga Jun 20 '23 at 22:05
  • Are you actually trying to get this to to pass `mypy`? – juanpa.arrivillaga Jun 20 '23 at 22:06
  • The "proper" way depends **only** on what you want to do with the argument `i`. It may not have anything to do with pandas specifics at all. I assume you do more than just print that argument. So without knowing what you actually want to do with `i`, there is no way to recommend anything. Please edit your question and include the relevant code for what you want to do with `i` inside `info`. – Daniil Fajnberg Jun 21 '23 at 11:19
  • And just to make this very clear: _"How to type the output from `df.itertuples`"_ is a non-starter. You don't. The people who wrote it do (and have). It is `Iterable[tuple[Any, ...]]`. You should not concern yourself with that. You should think: _"What type of object is my `info` function supposed to work on in general?"_ – Daniil Fajnberg Jun 21 '23 at 11:24
  • I clarified my goal: "to make my aware that `i` has fields `Index`, `x` and `y`" in function `info`" – zyxue Jun 21 '23 at 14:04
  • Thanks @NickODell, using NamedTuple appears to work, I posted an answer https://stackoverflow.com/a/76524286/856804 – zyxue Jun 21 '23 at 14:17

2 Answers2

1

Per @Nick ODell's suggestion, using NamedTuple appears to pass mypy check:

import pandas as pd

from typing import NamedTuple

class Foo(NamedTuple):
    Index: int
    x: int
    y: str


def info(i: Foo) -> None:
    print(f'{type(i)=:}')
    print(f'{i=:}')
    print(f'{i.Index=:}')
    print(f'{i.x=:}')
    print(f'{i.y=:}')

if __name__ == "__main__":
    df = pd.DataFrame([[1,'a'], [2, 'b']], columns=['x', 'y'])
    for i in df.itertuples(name="Foo"):
        info(i)
zyxue
  • 7,904
  • 5
  • 48
  • 74
-1

To properly type the variable i in the info function without using Any, you can use the specific type hint provided by pandas for the itertuples method. The type hint for the named tuple returned by itertuples is pandas.core.frame.Pandas. You can import and use this type directly in your function signature.

Here's the modified code with the appropriate type hint for I:

import pandas as pd
from typing import Tuple

def info(i: Tuple[int, int, str]) -> None:
    print(f'{type(i)=}')
    print(f'{i=}')

if __name__ == "__main__":
    df = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['x', 'y'])
    for i in df.itertuples():
        info(I)

With this modification, the output should be:

type(i)=<class 'pandas.core.frame.Pandas'>
i=Pandas(Index=0, x=1, y='a')
type(i)=<class 'pandas.core.frame.Pandas'>
i=Pandas(Index=1, x=2, y='b')

Note that the type hint Tuple[int, int, str] assumes that the dataframe has two integer columns followed by a string column. Adjust the type hint accordingly if your actual dataframe has a different structure.