I have two questions concerning a large csv file which contains data in the following way formatted as strings:
"XAU=,XAU=,XAG=,XAG="
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"
"25/08/2014 6:00:05,1200,,,"
Is there a way to efficiently load this into a pandas dataframe object? Alternatively, also multiple pandas Series objects would do the job. So far I tried:
df = read_csv(path, header=None)
df[0].str[0:-1].str.split(',', return_type='frame')
The second line is an answer from this thread pandas split string into columns. However, I wonder if there is an even better way, especially since I have different data types? Secondly, how can I correctly parse the dates with to_datetime()
. I tried to reindex df
and used df.index = df.index.to_datetime()
. This worked only half way because it did not strictly keep the dd/mm/yyyy ...
format. Some dates were incorrectly parsed as mm/dd/yyyy ...
. I'm looking for fast ways because eventually I will loop over many such csv's. Thx for any help!
EDIT: Ideally data in this form should be handled as well:
"XAU=,XAU=,XAG=,XAG="
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"
",,25/08/2014 6:00:05,19.50,"
So with the answer provided below,
data = StringIO(
'''
"XAU=,XAU=,XAG=,XAG="
"25/08/2014 6:00:05,1200.343,25/08/2014 6:00:03,19.44,"
",,25/08/2014 6:00:05,19.5,"
''')
df would become:
XAU XAU XAG XAG
0 25/08/2014 6:00:05 1200.343 25/08/2014 6:00:03 19.44
1 , 25/08/2014 6:00:05 19.5 \n