0

I have data coming from Redis and I'm struggling to convert this data in DF

Data from Redis

data = ["[Timestamp('2018-05-22 09:15:00'), 3555.75, 3559.15, 3546.45, 3548.3, 34250, 'Green', 34250]",
 "[Timestamp('2018-05-22 09:16:00'), 3549.05, 3551, 3543.25, 3548, 19500, 'Green', 53750]",
 "[Timestamp('2018-05-22 09:17:00'), 3548.95, 3553.2, 3548.05, 3548.9, 12000, 'Green', 65750]"]

How to store above data in pandas dataframe in below columns

df = pd.DataFrame(columns= 'date','open','high','close','low','volume','close','total_volume'])
user3483203
  • 50,081
  • 9
  • 65
  • 94
user2371563
  • 364
  • 4
  • 14

2 Answers2

2

As I said above, the easiest approach here is changing how you send the data upstream. If that is not an option, here is an approach using your current data:

split with strip

data = [i.strip('[]').split(',') for i in data]

pd.DataFrame

df = pd.DataFrame(data, columns=['date','open','high','close','low','volume','close','total_volume'])

                               date      open      high     close      low  \
0  Timestamp('2018-05-22 09:15:00')   3555.75   3559.15   3546.45   3548.3
1  Timestamp('2018-05-22 09:16:00')   3549.05      3551   3543.25     3548
2  Timestamp('2018-05-22 09:17:00')   3548.95    3553.2   3548.05   3548.9

   volume     close total_volume
0   34250   'Green'        34250
1   19500   'Green'        53750
2   12000   'Green'        65750

If your Timestamp column always has the above format, you can postprocess it using basic string slicing:

pd.to_datetime(df.date.str[11:-2])

0   2018-05-22 09:15:00
1   2018-05-22 09:16:00
2   2018-05-22 09:17:00
Name: date, dtype: datetime64[ns]
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • I would suggest to use `ast.literal_eval` instead of manually parsing the strings. See https://stackoverflow.com/questions/10775894/converting-a-string-representation-of-a-list-into-an-actual-list-object for reference. – ypnos May 22 '18 at 19:51
  • Using `ast.literal_eval` will result in a malformed node or string error here. That was my first thought as well. – user3483203 May 22 '18 at 19:52
  • The timestamps are the problem, but this way they are also awkward, the timestamp string needs further parsing. – ypnos May 22 '18 at 19:53
  • The above solution works like a charm. However, data cleaning is time-consuming. I receive a chunk of data from API and that data is stored in Redis. While retrieving data from Redis I'm having trouble to put the data in pandas DataFrame in one shot. Attached link to data https://s3-us-west-2.amazonaws.com/shared-girish/Data.txt @chrisz any better way to upload data to Redis? As I'm using rpush(data) to push data in Redis – user2371563 May 23 '18 at 01:12
  • I was able to upload data in redis as dictionary. Thank you @chrisz for direction – user2371563 May 24 '18 at 01:39
0
import pandas as pd
import numpy as np
import datetime

data = [[pd.Timestamp('2018-05-22 09:15:00'), 3555.75, 3559.15, 3546.45, 3548.3, 34250, 'Green', 34250],
[pd.Timestamp('2018-05-22 09:16:00'), 3549.05, 3551, 3543.25, 3548, 19500, 'Green', 53750],
[pd.Timestamp('2018-05-22 09:17:00'), 3548.95, 3553.2, 3548.05, 3548.9, 12000, 'Green', 65750]]


DataFrame = pd.DataFrame(data, columns=['date', 'open', 'high', 'close', 'low', 'volume', 'close', 'total_volume'])
print(DataFrame)

And here is your output:

             date     open     high    close     low  volume  close  \
0 2018-05-22 09:15:00  3555.75  3559.15  3546.45  3548.3   34250  Green   
1 2018-05-22 09:16:00  3549.05  3551.00  3543.25  3548.0   19500  Green   
2 2018-05-22 09:17:00  3548.95  3553.20  3548.05  3548.9   12000  Green   

total_volume  
0         34250  
1         53750  
2         65750  
Simeon Ikudabo
  • 2,152
  • 1
  • 10
  • 27
  • This is not particularly helpful, as he is getting the data as a list of strings, not as a list of `pd.Timestamps` and other data. – user3483203 May 22 '18 at 20:03
  • @chrisz Thanks for the feeback. Could you clarify? The OP doesn't specify in terms of using a pandas Timestamp. They can obviouslyuse this later to slice and look up data values. The string 'green' is returned. I also assumed that since this is stock data, they would indeed want numerical data besides the string 'Green' (for a Green close). I may have made the wrong assumption, but that's why I intentionally got rid of the strings. – Simeon Ikudabo May 22 '18 at 20:06
  • 1
    You are treating the data like an actual list. What he has is the string representation of lists, and due to the `Timestamp`, they cannot be parsed using something like `ast.literal_eval`. If he had the actual list, you're approach would work fine. – user3483203 May 22 '18 at 20:09
  • @chrisz I see what you mean. I did that intentionally, and must have misunderstood what the OP was looking for. That's my error. That's why I converted the string representations to actual lists. – Simeon Ikudabo May 22 '18 at 20:11