Extract a column's data into a variable

Question

I've got a very large dataframe where one of the columns is a dictionary itself. (let's say column 12). In that dictionary is a part of a hyperlink, which I want to get.

In Jupyter, I want to display a table where I have column 0 and 2, as well as the completed hyperlink

I think I need to:

Extract that dictionary from the dataframe
Get a particular keyed value from it
Create the full hyperlink from the extracted value
Copy the dataframe and replace the column with the hyperlink created above

Let's just tackle step 1 and I'll make other questions for the next steps. How do I extract values from a dataframe into a variable I can play with?

import pytd
import pandas

client = pytd.Client(apikey=widget_api_key.value, database=widget_database.value)
results = client.query(query)
dataframe = pandas.DataFrame(**results)
dataframe
# Not sure what to do next

Have you tried this: https://stackoverflow.com/questions/50512188/unpack-dictionary-from-pandas-column ? — rg_elite, Aug 07 '19 at 00:24
@rg_elite Not quite the same I think. I want to actually get my dictionary into a variable and play with it. — Christopher Pisz, Aug 07 '19 at 00:29
You should show what the format of your input is and what you want for your output. — Akaisteph7, Aug 07 '19 at 00:57
[Provide a copy of the data](https://stackoverflow.com/questions/52413246/how-do-i-provide-a-reproducible-copy-of-my-existing-dataframe) — Trenton McKinney, Aug 07 '19 at 01:44

jottbe · Answer 1 · 2019-08-07T09:18:23.227

If you only want to extract one key from the dictionary and the dictionary is already stored as a dictionary in the column, you can do it as follows:

import numpy  as np
import pandas as pd

# assuming, your dicts are stored in column 'data'
# and you want to store the url in column 'url'
df['url']= df['data'].map(lambda d: d.get('url', np.NaN) if hasattr(d, 'get') else np.NaN)

# from there you can do your transformation on the url column

Testdata and results

df= pd.DataFrame({
        'col1': [1, 5, 6],
        'data': [{'url': 'http://foo.org', 'comment': 'not interesting'}, {'comment': 'great site about beer receipes, but forgot the url'}, np.NaN],
        'json': ['{"url": "http://foo.org", "comment": "not interesting"}', '{"comment": "great site about beer receipes, but forgot the url"}', np.NaN]
    }
)

# Result of the logic above:
   col1                                               data             url
0     1  {'url': 'http://foo.org', 'comment': 'not inte...  http://foo.org
1     5  {'comment': 'great site about beer receipes, b...             NaN
2     6                                                NaN             NaN

If you need to test, if your data is already stored in python dicts (rather than strings), you can do it as follows:

print(df['data'].map(type))

If your dicts are stored as strings, you can convert them to dicts first based on the following code:

import json

def get_url_from_json(document):
    if pd.isnull(document):
        url= np.NaN
    else:
        try:
            _dict= json.loads(document)
            url= _dict.get('url', np.NaN)
        except:
            url= np.NaN
    return url

df['url2']= df['json'].map(get_url_from_json)

# output:
print(df[['col1', 'url', 'url2']])   

   col1             url            url2
0     1  http://foo.org  http://foo.org
1     5             NaN             NaN
2     6             NaN             NaN

Extract a column's data into a variable

1 Answers1