0

I have a dataset of daily sales number from e-commerce to apply VEC and VAR models on it.

The csv has only 2 columns as "data.event" and "data.lastUpdate".

  • The "data.lastUpdate" column is the date. But in the format of

"2017-04-10T06:22:33.230Z". First I need to convert it into YMD format. I did it with string slicing. All pieces of advice are welcome if you know a better way.

  • But the real problem is with the first column "data.event". The column has a title but in the column, there are the numbers of sales for each platform(Android, iOS, Rest, Total). I want to separate all this into new columns according to platforms and of course the total numbers. The sample lines are as below. How can I convert the lines into separated columns?

0 - {"ANDROID":6106,"REST":3322,"IOS":3974,"TOTAL"... 2017-04-10T06:22:33.230Z

10 - {"ANDROID":9,"TOTAL":9} 2017-03-31T05:28:23.081Z

The output I want to get is simply like:

Date Total Android Ios

25/6/2018 35757 12247 9065

24/6/2018 18821 7582 5693

Since this is the first time that I use stackoverflow sorry for my bad body.

Thanks in advance.

Cumali
  • 1
  • 2
    Welcome to Stack Overflow! Please provide a [reproducible example in r](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The link I provided, will tell you how. Moreover, please take the [tour](https://stackoverflow.com/tour) and visit [how to ask](https://stackoverflow.com/help/how-to-ask). Cheers. – M-- Mar 26 '19 at 19:25

1 Answers1

0

convert it into YMD format ... if you know a better way

The usual strptime / loads idioms would be to use:

$ python
>>> import datetime as dt
>>> stamp = '2017-04-10T06:22:33.230Z'
>>> dt.datetime.strptime(stamp, '%Y-%m-%dT%H:%M:%S.%fZ')
datetime.datetime(2017, 4, 10, 6, 22, 33, 230000)
>>>

and

>>> import json
>>> csv_event = '{"ANDROID":9,"TOTAL":9}'
>>> d = json.loads(csv_event)
>>> d['ANDROID']
9
>>> d['TOTAL']
9
>>>
J_H
  • 17,926
  • 4
  • 24
  • 44
  • thanks so much. It pushed me to find df.join(df['stats'].apply(json.loads).apply(pd.Series)) which worked very well. – Cumali Mar 26 '19 at 20:15