1

I'm attempting to create a DataFrame with the following:

from pandas import DataFrame, read_csv

import matplotlib.pyplot as plt
import pandas as pd
import sys

# The inital set of baby names and birth rates
names =['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]

#Now we wil zip them together
BabyDataSet = zip(names,births)
    ##we have to add the 'list' for version 3.x
print (list(BabyDataSet))

#create the DataFrame
df = DataFrame(BabyDataSet, columns = ['Names', 'Births'] )
print (df)

when I run the program I get the following error: 'data type can't be an iterator' I read the following, 'What does the "yield" keyword do in Python?', but I do not understand how that applies to what I'm doing. Any help and further understanding would be greatly appreciated.

Community
  • 1
  • 1

2 Answers2

1

In python 3, zip returns an iterator, not a list like it does in python 2. Just convert it to a list as you construct the DataFrame, like this.

df = DataFrame(list(BabyDataSet), columns = ['Names', 'Births'] )
chrisb
  • 49,833
  • 8
  • 70
  • 70
  • I thought that might fix it but it just gave me another error, 'Shape of passed values is (0,0), indices imply (2,0) So I thought I was doing it wrong. Could this indicate a problem with my pandas file itself? – Aka_Minimal Aug 05 '14 at 19:20
  • 1
    I believe the issue is that BabyDataSet is created as an generator, so the `print(list(BabyDataSet))` line is "eating" it up. Remember that generators can only be iterated through once. Simply removing that line should fix it. What you should probably do is save it as a list when you initialize it so you don't accidentally use up the generator: `BabyDataSet = list(zip(names,births))` – Roger Fan Aug 05 '14 at 19:29
  • @ rfan That fixed the second error and I was able to create my df, thanks to both of you. :) p.s how do I upvote a helpful comment? – Aka_Minimal Aug 05 '14 at 19:56
1

You can also create the dataframe using an alternate syntax that avoids the zip/generator issue entirely.

df = DataFrame({'Names': names, 'Births': births})

Read the documentation on initializing dataframes. Pandas simply takes the dictionary, creates one column for each entry with the key as the name and the value as the value.

Dict can contain Series, arrays, constants, or list-like objects

Roger Fan
  • 4,945
  • 31
  • 38
  • Could you explain the logic or a provide a link as to how that works? – Aka_Minimal Aug 05 '14 at 19:59
  • Edited, though not sure how helpful that is. I find it pretty intuitive, the dataframe will be initialized with column names as the keys in the dictionary and any iterable or constant as the values of the column. – Roger Fan Aug 05 '14 at 20:10