I have two DataFrames:
df_components
: list of unique components (ID, DESCRIPTION)dataset
: several rows and columns from a CSV (one of these columns contains the description of a component).
I need to create a new column in the dataset
with the ID of the component according to the df_components
.
I tried to do this way:
Creating the df_components
and the ID
column based on the index
components = dataset["COMPDESC"].unique()
df_components = pd.DataFrame(components, columns=['DESCRIPTION'])
df_components.sort_values(by='DESCRIPTION', ascending=True, inplace=True)
df_components.reset_index(drop=True, inplace=True)
df_components.index += 1
df_components['ID'] = df_components.index
Sample output:
DESCRIPTION ID
1 AIR BAGS 1
2 AIR BAGS:FRONTAL 2
3 AIR BAGS:FRONTAL:SENSOR/CONTROL MODULE 3
4 AIR BAGS:SIDE/WINDOW 4
Create the COMP_ID
in the dataset:
def create_component_id_column(row):
found = df_components[df_components['DESCRIPTION'] == row['COMPDESC']]
return found.ID if len(found.index) > 0 else None
dataset['COMP_ID'] = dataset.apply(lambda row: create_component_id_column(row), axis=1)
However this gives me the error ValueError: Wrong number of items passed 248, placement implies 1
. Being 248 the number of items on df_components
.
How can I create this new column with the ID from the item found on df_components
?