I'm doing an analysis using Python to see how long we are keeping conversations in our social media channels by counting the number of interactions and reading the messages.
I'm thinking the approach would be to make the first table look like the second table. Steps to get there:
- Parse our the @username into it's own field. 99% of the time, our responses start with @username
- Count the number of times the @username shows up - this indicates the number of messages we send the user
- Turn the inbound and outbound messages from long to wide format. Not certain how many fields I'd have to create but as long as I know the technique, I can worry about the # of fields later.
Right now timestamp isn't that important to me but it could be in the future. Regardless, I would use the same technique as the inbound and outbound messages
Before table
Inbound Message Outbound message Inbound Time Outbound Time Account Hello, how are you @userA I'm good! mm/dd/yy hh:mm mm/dd/yy hh:mm FB Where is the cat? @userB what cat? mm/dd/yy hh:mm mm/dd/yy hh:mm Twitter What is a pie @user3 it is a food mm/dd/yy hh:mm mm/dd/yy hh:mm Twitter The black cat @userB I don't know mm/dd/yy hh:mm mm/dd/yy hh:mm Twitter
After table
User Messages Account Inbound 1 Outbound 1 Inbound 2 Outbound 2 Inbound 3 Outbound 3 userA 1 FB Hello, how are you @user1 I'm good! null null null null userB 2 Twitter Where is the cat? @user2 what cat? The black cat @user2 I don't know null null user3 1 Twitter What is a pie @user3 it is a food null null null null