Basic pandas dataframe manipulation question

Question

I have the following JSON snippet:

{'search_metadata': {'completed_in': 0.027,
                     'count': 2},
 'statuses': [{'contributors': None,
               'coordinates': None,
               'created_at': 'Wed Mar 31 19:25:16 +0000 2021',
               'text': 'The text',
               'truncated': True,
               'user': {'contributors_enabled': False,
                        'screen_name': 'abcde',
                        'verified': false
                        }
               }
               ,{...}]
}

The info that interests me is all in the statuses array. With pandas I can turn this into a DataFrame like this

df = pd.DataFrame(Data['statuses'])

Then I extract a subset out of this dataframe with

dfsub = df[['created_at', 'text']]

display(dfsub) shows exactly what I expect.

But I also want to include [user][screen_name] to the subset.

dfs = df[[ 'user', 'created_at', 'text']]

is syntactically correct but user contains to much information.

How do I add only the screen_name to the subset? I have tried things like the following but none of that works

[user][screen_name]
user.screen_name
user:screen_name

i'm curious. why did you use dataframe rather than json in the first place? — Simon, Apr 03 '21 at 12:50
point of clarification: this is not a JSON snippet, but a `dict` (well, almost: the last "`false`" --in lowercase-- is not correct). — Pierre D, Apr 03 '21 at 12:51
@Simon for being able to use the display function with its pretty output — mgr326639, Apr 03 '21 at 17:07

score 3 · Accepted Answer · answered Apr 03 '21 at 12:50

3

I would normalize data before contructing DataFrame. Take a look here: https://stackoverflow.com/a/41801708/14596032

Working example as an answer for your question:

df = pd.json_normalize(Data['statuses'], sep='_')
dfs = df[[ 'user_screen_name', 'created_at', 'text']]
print(dfs)

answered Apr 03 '21 at 12:50

Vladimir Gromes

106
4

Totally agree, this is a much better way to put nested dicts into a `df`. – Pierre D Apr 03 '21 at 12:56

Pierre D · Answer 2 · 2021-04-03T12:58:25.167

0

You can use pd.Series.str. The docs don't do justice to all the wonderful things .str can do, such as accessing list and dict items. Case in point, you can access dict elements like this:

df['user'].str['screen_name']

That said, I agree with @VladimirGromes that a better way is to normalize your data into a flat table.

edited Apr 03 '21 at 12:58

answered Apr 03 '21 at 12:45

Pierre D

24,012
7
60
96

score 0 · Answer 3 · answered Apr 03 '21 at 12:57

0

You can try to access Dataframe, then Series, then Dict

df['user']                   # user column = Series
df['user'][0]                # 1st (only) item of the Series = dict
df['user'][0]['screen_name'] # screen_name in dict

answered Apr 03 '21 at 12:57

Simon

411
6
11

Basic pandas dataframe manipulation question

3 Answers3