2

I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).

user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])

# Need to add a column dynamically 
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]

I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.

Droid
  • 83
  • 1
  • 6

2 Answers2

2

This is cumsum:

df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())

Output:

0         A
1        AB
2       ABC
3      ABCD
4         A
5        AB
6       ABC
7         A
8        AB
9       ABC
10     ABCD
11    ABCDE
Name: Variables, dtype: object
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Your question is a little vague, but would something like this work?

df['DynamicColumn'] = df['user_id'] + df['sequencenumber']
jamzsabb
  • 1,125
  • 2
  • 18
  • 40