I think you need DataFrame.pivot_table
witg aggfunc = ''.join
or
another that is valid for str
type.
new_df = (df.pivot_table(index = 'id',columns = 'sem',
values = 'stu',aggfunc = ''.join)
.rename_axis(columns = None,index = None))
print(new_df)
sem1 sem2 sem3
1 B A NaN
2 A NaN A
You could use another function to treat the values deduplicated for the same ID and sem, for example first
, although the way to not lose information here is ''.join
UPDATE
print(df)
id sem stu
0 1 sem2 A
1 1 sem1 B
2 1 sem1 A
3 2 sem1 A
4 2 sem3 A
new_df=( df.assign(count=df.groupby(['id','sem']).cumcount())
.pivot_table(index = 'id',columns = ['sem','count'],
values = 'stu',aggfunc = ''.join)
.rename_axis(columns = [None,None],index = None) )
print(new_df)
sem1 sem2 sem3
0 1 0 0
1 B A A NaN
2 A NaN NaN A
new_df=( df.assign(count=df.groupby(['id','sem']).cumcount())
.pivot_table(index = ['id','count'],columns = 'sem',
values = 'stu',aggfunc = ''.join)
.rename_axis(columns = None,index = [None,None]) )
print(new_df)
sem1 sem2 sem3
1 0 B A NaN
1 A NaN NaN
2 0 A NaN A
Solution without MultIndex:
new_df=( df.assign(count=df.groupby(['id','sem']).cumcount())
.pivot_table(index = 'id',columns = ['sem','count'],
values = 'stu',aggfunc = ''.join)
.rename_axis(columns = [None,None],index = None) )
#Solution with duplicates names of columns
#new_df.columns = new_df.columns.droplevel(1)
# sem1 sem1 sem2 sem3
#1 B C A NaN
#2 A NaN NaN A
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]
print(new_df)
sem1_0 sem1_1 sem2_0 sem3_0
1 B C A NaN
2 A NaN NaN A