-1

Excuse my noob question as I haven't done too much python coding so am not as familiar with "pythonic" ways and untyped languages.

In Python I see the Dataframe.to_dict() method has several ways it can return the dict. For example, Dataframe.to_dict("records") basically returns a list.

My question is, should the return type of this be type hinted to list or dict? Afaik type hinting has no runtime effect. And Dataframe.to_dict("records") is basically a list except for the fact that it calls Dataframe.to_dict(), so it'd stand to reason that it'd make more sense if I treat it as a list. But officially it's a dict

Kevin
  • 3,209
  • 9
  • 39
  • 53
  • 1
    "But officially it's a `dict`" - no it's not. – user2357112 Apr 28 '23 at 03:36
  • 3
    You should probably clarify you're refering to the Pandas `DataFrame.to_dict()` method. – nigh_anxiety Apr 28 '23 at 04:02
  • 1
    Wait, Pandas has a function called `to_dict` that can return a *list*? .... .... .... just why. – Silvio Mayolo Apr 28 '23 at 04:21
  • @SilvioMayolo It converts each record in the DataFrame to a dict, and returns you a list of all of those dicts. See the first link in my answer for a breakdown of the possible formats, although that post doesn't yet include the new "tight" option. – nigh_anxiety Apr 28 '23 at 12:56
  • As for why you might want a list of dicts for the DataFrame, it allows you to process and manipulate the data like JSON. Pandas has a `to_json()` method as well, but that just returns everything as a JSONified string, so you need to use `to_dict()` if you want to manipulate the data. – nigh_anxiety Apr 28 '23 at 13:05
  • @nigh_anxiety thanks you're right, I clarified it's `DataFrame.to_dict()` and thanks for the explanations! @SilvioMayolo I was equally confused as to why it can return a list but the explanation below makes sense. Definitely not something I'm used to. – Kevin Apr 28 '23 at 14:31
  • @Kevin The justification for why `to_dict()` has this one case of returning a list of dicts, is that every orient option in `to_dict()` directly correlates to the same option in `to_json()`. In the Python paradigm, it would be more confusing to have a separate method specifically for the "records" option, and having a separate method for each possible orientation just to keep your return types looking simpler is equally frustrating. Perhaps `to_collection()` would be more technically correct, but all of the options include dicts at the first or second level of the collection. – nigh_anxiety Apr 28 '23 at 17:09
  • 1
    I completely agree with @SilvioMayolo. The fact that the method is called `to_dict` and it can return a list is laughable. One of the many questionable Pandas design decisions. And the claim that this somehow corresponds to _"the Python paradigm"_ is even more questionable, to say the least. Maybe the "Pandas paradigm"... There is nothing frustrating about designing simple and clear-cut interfaces that **do one thing and do it well**. What is frustrating is _bloated_ library interfaces that are so inconsistent, you basically never know what you're going to get. – Daniil Fajnberg Apr 28 '23 at 17:15

1 Answers1

2

DataFrame.to_dict() returns a list[dict], in the case of orient = "records". For the other formats, the it returns a dict for which the values could be dict, list, list[list], or Series. The accepted answer on this post has a good breakdown of the possible outputs.

Pandas's documentation lists the return type as: dict, list or collections.abc.Mapping

If you wanted to add a type hint for the return of the Dataframe.to_dict() method and be more thorough, then you could use the following as of Python 3.10

def to_dict(orient: str, into: collections.abc.Mapping, index: bool) -> dict[str, dict] | dict[str, list] | dict[str, Series] | dict[str, list] | list[dict] | dict[int, dict]:

I'm not sure even that covers every possible combination, which is why using dict | list | collections.abc.Mapping would be more than sufficient, with additional documentation explaining the different outputs based on the value of the orient argument.
In Python, type hints do not prevent a script from running as a type mismatch in a strictly typed language would do. It's meant to help point out potential issues in your code and provide some assistance, but its not a hard stop. Parameter type hints are generally more important than return type hints as passing the wrong data type as a parameter is more likely to cause an exception to be raised.

nigh_anxiety
  • 1,428
  • 2
  • 4
  • 12
  • _Parameter type hints are generally more important than return type hints as passing the wrong data type as a parameter is more likely to cause an exception to be raised._ And returning `Any` makes the rest of the code using the function a raffle. "Place your bets, ladies and gentlemen! What might this function return? Maybe a list? Maybe a dictionary? Maybe something else entirely?" I mean, I too like to live dangerously. But I guess sometimes I just like to, you know, _know ahead of time_ what type I get out of a function. But that is just me. – Daniil Fajnberg Apr 28 '23 at 17:25
  • @DaniilFajnberg. Clearly, if a return type can be specified, it should be, and `Any` should almost never be used a type hint. In its place, if necessary, you should generally be using `object`. However that can also be overly restrictive in some cases. The return value of `to_dict()` is a dict unless you as the consumer specify `orient="records"`, so you presumably know what you're getting as output in that case. – nigh_anxiety Apr 28 '23 at 17:37