time complexity of loc with multiindex in comparison to dict

Question

Using loc

import pandas

# Data
df = pandas.DataFrame({"A": ["a", "a", "b", "b"],
                       "B": ["x", "y", "x", "y"],
                       "val": [1, 2, 3, 4]})
df = df.set_index(["A", "B"])

# Extract relevant value
df.loc[("a", "x"), "val"]
# 1

Using dict

# Data
dat = {"a": {"x": 1, "y": 2},
       "b": {"x": 3, "y": 4}}

# Extract relevant value
dat["a"]["x"]
# 1

How does method 1 compare with method 2 in terms of time complexity? This post's answer mentions that loc would work in constant time but I'm not clear if that applies to multi-index too.

you are selecting a single value, I'd expect dict to be faster and constant time. For pandas, if the index is unique(applies if it is a single or multi index), then the selection is constant time. Again, if you are selecting a single value, I dont see a reason to move into Pandas to do it, unless you have much more data than this, and you will be doing some other operations — sammywemmy, Nov 03 '21 at 01:08
it depends. repeated selection of the same value? if you are going to select diferent values, and do an operation on those values, I feel working within Pandas will give you a faster option via vectorisation. I don't know the full details of what you are working on, so I might be off the mark — sammywemmy, Nov 03 '21 at 01:14
It kinda really depends on how many block managers are present in the DataFrame too and how many different ones need toughed to produce the subset DataFrame... This is one of those questions where the pandas performance is going to change rather significantly depending on scale, dtypes, etc... While the dictionary performance is always going to be consistent. — Henry Ecker, Nov 03 '21 at 01:17
But yes. MultiIndex is hashable lookup on all index columns (tuple), performance with MultiIndex is not worse than performance on a single index (assuming both are unique). (*Note, however, the return type of `loc` is a DataFrame which will always have overhead to copy and create where this might not be the case with a dictionary) — Henry Ecker, Nov 03 '21 at 01:19
Need to be touched* to produce the subset DataFrame. Too late to edit that one... oops — Henry Ecker, Nov 03 '21 at 01:24
Most of time when unique key search happen to two columns , we can always pivot it and do the .loc search df.loc['A','x'] same as dict and if you would like to speed it up, do .at — BENY, Nov 03 '21 at 01:35

time complexity of loc with multiindex in comparison to dict

0 Answers0