So I have many pandas data frames with 3 columns of categorical variables:
D F False
T F False
D F False
T F False
The first and second columns can take one of three values. The third one is binary. So there are a grand total of 18 possible rows (not all combination may be represented on each data frame).
I would like to assign a number 1-18 to each row, so that rows with the same combination of factors are assigned the same number and vise-versa (no hash collision).
What is the most efficient way to do this in pandas?
So, all_combination_df
is a df with all possible combination of the factors. I am trying to turn df such as big_df
to a Series with unique numbers in it
import pandas, itertools
def expand_grid(data_dict):
"""Create a dataframe from every combination of given values."""
rows = itertools.product(*data_dict.values())
return pandas.DataFrame.from_records(rows, columns=data_dict.keys())
all_combination_df = expand_grid(
{'variable_1': ['D', 'A', 'T'],
'variable_2': ['C', 'A', 'B'],
'variable_3' : [True, False]})
big_df = pandas.concat([all_combination_df, all_combination_df, all_combination_df])