I am fairly new to pandas so forgive me if a function already exists for what I am wanting to do or if I am going about it the wrong way.
I'm using two dataframes at the moment:
df_xmlTxt ( index = ['curid', 'title', 'text', 'cat'] )
df_xmlCategories ( index = ['curid', 'cat_1', 'cat_2', ... , 'cat_n'] )
cat here refers to categories, where each row's 'cat' value in df_xmlTxt is a list of categories.
I want to populate the df_xmlCategories dataframe to where each row will have a true/ false value for every category column.
I have gone about writing some code that I thought would do the trick but I'm getting into errors. So far, df_xmlCategories is a copy of the other dataframe's ID column and the categories appended to this where each category column has a 0 value. It is the very last line that does not work and I need to change
# Create categories dataframe for all articles and populate
df_xmlCategories = df_xmlTxt[['curid']].copy()
cat = dict.fromkeys(getAllCats(), 0)
df_xmlCategories = df_xmlCategories.assign(**cat)
# Update all boolean values for every categorys
df_xmlCategories = [[df_xmlCategories.loc[j][curId] = 1 for j in (df_xmlTxt.loc['cat'][curId].values.tolist())] for curId in df_xmlCategories['curId']]
Any advice/ different ways of going at this would be appreciated
edit: for clarity, the two tables look like: df_xmlTxt
my aim is to have a true boolean value for all cells where that row's curID has belong to that category