1

I am working with a JSON file that I pulled from Github using:

curl https://api.github.com/repos/mbostock/d3/stats/commit_activity > d3_commit-activity.json

Then, within Pandas I ran the following commands:

import pandas as pd
import numpy as np
import matplotlib.pylab as plt

df = pd.io.json.read_json("d3_commit-activity.json")

One of the columns in df is called "days" and its values are lists formatted like this:

[0,0,0,1,0,1,0]
[0,0,0,0,0,1,1]
[3,0,0,0,0,0,0]

In other words, each list is composed of exactly seven numbers. I want to create seven new columns out each element in these lists but I am completely baffled by explanations to similar problems. I tried following Bradley's solution to this problem (pandas: How do I split text in a column into multiple rows?) but have been told that "name 'Series' is not defined". Tried changing to "pd.Series" which seems to work for that command but fails in the later commands.

Surely there must be a simple, straightforward way to take these lists and break them up into individual columns?

Community
  • 1
  • 1
Slavatron
  • 2,278
  • 5
  • 29
  • 40

2 Answers2

2

Let's define a list of day_names:

import pandas as pd    
day_names = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

Then either of these will achieve the desired result:

df[day_names] = df.days.apply(lambda x: pd.Series(x))

Or,

df[day_names] = df.apply(lambda row: pd.Series(row.days), axis=1)
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
0

An easy way to create columns from the list is as follows:

df2 = pd.DataFrame(zip(*df.days)).T
df2.columns = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
>>> df2
   Sun  Mon  Tue  Wed  Thu  Fri  Sat
0    0    0    0    1    0    1    0
1    0    0    0    0    0    1    1
2    3    0    0    0    0    0    0

Using the asterisk unpacks the arguments in days putting them in separate columns.

Alexander
  • 105,104
  • 32
  • 201
  • 196