0

I'm using Python Pandas Dataframe for Data Analyse of some logs. I have a csv with something like: number_items event_type ... ... ... session_id ... ... ...

My problem is that in my session there are different types of events, and only one of them has something for number_items. Or, numbers_items is what interests me.

So what I want to see is how each parameter of each event influences the number_items.

So, what I want to do is: Copy the number_items of the event that has it (always the last one in the session) to all the other events of the session. Separate each event_type in a different Dataframe (to avoid a lot of nulls that exist only because the attribute doesn't correspond to the event) and analyse it.

I'm blocked at the first part

I tried something like this:

currentSession = '0'
currentItems = 0
for index, row in reversed(df.iterrows()) :
    if row['session_id'] == currentSession :
        row['number_items'] = currentItems
    else : 
        currentSession = row['session_id']
        currentItems = row['number_items']

Obviously, it's not working, I just wanted to show the idea.

I'm kind of new in Python, so I would appreciate some help.

Thanks

edit: data sample here

For security reasons, I let only the relevant information

1 Answers1

0

The rows you get back from iterrows are copies so they dont overwrite your original dataframe. Use another form of iterator that references the original dataframe.

see here Updating value in iterrow for pandas

(also im note entirely sure what it is you are trying to do but instinctively it seems very inefficient - i suspect there are natural pandas methods which might do what you trying to achieve in one or two lines, look up the where() method)

Attack68
  • 4,437
  • 1
  • 20
  • 40
  • even reversed doesn't work well, I get error. I was wondering if there is not a better method than to iterate. Something to give all the cells of the session the value of cell of the event x if it exists, 0 if not – Arhiliuc Cristina May 28 '18 at 07:33
  • reversed only works if the iterator has a __reversed__() method or supports the procedure, which a pandas dataframe may or may not do. Sounds like it doesnt. – Attack68 May 28 '18 at 07:36