I have the giant dataset as the following. Most of the columns are in "Json format" but in pandas object datatype. I would like to convert them all into dicts and be able to fetch the values of each key.
id body created_at user source symbols entities conversation likes
0 10000001 sina are the stocks i hate to trade... 2012-10-15 18:54:55+00:00 {'id': 117974, 'username': 'choptrade', 'name'... {'id': 1, 'title': 'Stocktwits', 'url': 'https... [{'id': 2839, 'symbol': 'NFLX', 'title': 'NetF... {'sentiment': None} None None
1 10000002 An Introduction to Financial Spread Betting ht... 2012-10-15 18:55:00+00:00 {'id': 179410, 'username': 'KennysElliottWaves... {'id': 2, 'title': 'Twitter', 'url': 'http://t... [{'id': 667, 'symbol': 'EURUSD', 'title': 'Eur... {'sentiment': None} None None
2 10000003 Published $TXT estimate on Estimize: +0.05 EPS... 2012-10-15 18:55:00+00:00 {'id': 12134, 'username': '1nvestor', 'name': ... {'id': 2, 'title': 'Twitter', 'url': 'http://t... [{'id': 7506, 'symbol': 'TXT', 'title': 'Textr... {'sentiment': None} None None
3 10000005 $VRNG holding lot of shares under 3. Hold or s... 2012-10-15 18:55:06+00:00 {'id': 146417, 'username': 'CryptoMafia', 'nam... {'id': 1, 'title': 'Stocktwits', 'url': 'https... [{'id': 9214, 'symbol': 'XSPA', 'title': 'Xpre... {'sentiment': None} {'parent_message_id': 10000005, 'in_reply_to_m... None
4 10000006 We have a huge tail on oil. If we close near c... 2012-10-15 18:55:07+00:00 {'id': 86846, 'username': 'johnkicklighter', '... {'id': 1, 'title': 'Stocktwits', 'url': 'https... [{'id': 643, 'symbol': 'CL_F', 'title': 'Light... {'chart': {'thumb': 'https://charts.stocktwits... {'parent_message_id': 10000006, 'in_reply_to_m... None
... ... ... ... ... ... ... ... ... ...
173592731 459260899 $BBIG I really thought we would get a PR this ... 2022-05-13 12:39:08+00:00 {'id': 3231219, 'username': 'DMaCATO', 'name':... {'id': 2269, 'title': 'StockTwits Web', 'url':... [{'id': 16144, 'symbol': 'BBIG', 'title': 'Vin... {'sentiment': None} None None
173592732 459260904 $LUNA Absolutely iconic in the story of why 40... 2022-05-13 12:39:09+00:00 {'id': 5647436, 'username': 'Julieblue', 'name... {'id': 2269, 'title': 'StockTwits Web', 'url':... [{'id': 2563, 'symbol': 'LUNA', 'title': 'Luna... {'sentiment': {'basic': 'Bearish'}} None None
173592733 459260906 $TWTR get these people on the ropes and drop b... 2022-05-13 12:39:09+00:00 {'id': 4253849, 'username': 'Trump2036', 'name... {'id': 2269, 'title': 'StockTwits Web', 'url':... [{'id': 7872, 'symbol': 'TWTR', 'title': 'TWIT... {'sentiment': None} None None
173592734 459260907 $SPY I want the shorts to cry today 2022-05-13 12:39:09+00:00 {'id': 1129723, 'username': 'labibaouf', 'name... {'id': 2269, 'title': 'StockTwits Web', 'url':... [{'id': 7271, 'symbol': 'SPY', 'title': 'SPDR ... {'sentiment': {'basic': 'Bullish'}} None None
173592735 459260909 TSLA but bad for democracy and... 2022-05-13 12:39:09+00:00 {'id': 727683, 'username': 'Iwillplay', 'name'... {'id': 2095, 'title': 'StockTwits For Android ... [{'id': 7872, 'symbol': 'TWTR', 'title': 'TWIT... {'sentiment': None} None None
my current apporach is to use the pandas apply:
import ast
def dict_convert(x):
try:
return ast.literal_eval(x)
except:
return np.nan
df["col"] = df.col.apply(lambda x: dict_convert(x))
However this would take so much time and it is not computational efficient. I have 64 cores so I would like to have some vertorrized method. I also tried the Pandarallel module it ended up being taking significantly more time than the base pandas apply.
pd.json_normalize does not seem to work either. I think it is due to the instances being in object data type.
pd.json_normalize(df.col.iloc[:100].values)
out:
0
1
2
3
4
...
95
96
97
98
99
100 rows × 0 columns