I tried looking, but clearly I am missing a trick here. I tried to use couple of ideas on splitting a string separated by ;
in a DataFrame
in Python.
Can anybody tell me what I am doing wrong, I have only just picked up Python and would appreciate help. What I want is to split the string in recipient-address
and duplicate the rest of the rows for each row. I have a LOT of log files to get through so it needs to be efficient. I am using Anaconda python version 2.7 o Windows 7 64bit. Thanks.
The data in the input looks roughly like this:
#Fields: date-time,sender-address,recipient-address
2015-06-22T00:00:01.051Z, persona@gmail.com, other@gmail.com;mickey@gmail.com
2015-06-22T00:00:01.254Z, personb@gmail.com, mickey@gmail.com
What I am aiming at is:
#Fields: date-time,sender-address,recipient-address
2015-06-22T00:00:01.051Z, persona@gmail.com, other@gmail.com
2015-06-22T00:00:01.051Z, persona@gmail.com, mickey@gmail.com
2015-06-22T00:00:01.254Z, personb@gmail.com, mickey@gmail.com
I have tried this based on this
for LOGfile in LOGfiles[:1]:
readin = pandas.read_csv(LOGfile, skiprows=[0,1,2,3], parse_dates=['#Fields: date-time'], date_parser = dateparse )
#s = df['recipient-address'].str.split(';').apply(Series, 1).stack()
df=pandas.concat([Series(row['#Fields: date-time'], row['sender-address'],row['recipient-address'].split(';'))
for _, row in readin.iterrows()]).reset_index()
I keep getting the error:
NameError Traceback (most recent call last)
in ()
4 readin = pandas.read_csv(LOGfile, skiprows=[0,1,2,3], parse_dates= ['#Fields: date-time'], date_parser = dateparse )
5 df=pandas.concat([Series(row['#Fields: date-time'], row['sender-address'],row['recipient-address'].split(';'))
----> 6 for _, row in readin.iterrows()]).reset_index()
7
NameError: name 'Series' is not defined