3

I have a pandas series that looks like this:

>>> myseries 2012-01-01 15:20:00-05:00 2 2012-01-01 15:30:00-05:00 1 2012-01-01 15:40:00-05:00 0...

And I try to put it into a dataframe as so:

>>> mydf = pd.DataFrame(myseries, columns=["myseries"], index = myseries.index)

and all the values become NaN for some reason:

>>> mydf 2012-01-01 15:20:00-05:00 NaN 2012-01-01 15:30:00-05:00 NaN 2012-01-01 15:40:00-05:00 NaN

I'm pretty confused. This seems like a really simple application. What am I doing wrong? By the way, replacing with pd.DataFrame(myseries.values, columns=...) fixes the problem, but why is it necessary? Thank you.

user
  • 621
  • 1
  • 9
  • 21
  • Can you post the df you are using? Initializing a DataFrame using a Series works for me. – Alex Mar 09 '15 at 03:57
  • I can't post all the data if that's what you mean.. it's 200,000 rows. Its type is `` – user Mar 09 '15 at 05:50
  • If you create the df without specifying the index, and then redefine the index, does it work? – cphlewis Mar 09 '15 at 08:35
  • It does if I don't specify a column name, but I need to do that because the name needs to change. At that point, I must also define indexes to not get an empty dataframe. – user Mar 09 '15 at 16:06
  • Possible duplicate of [Adding new column to existing DataFrame in Python pandas](http://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas) – thleo Mar 30 '17 at 14:57

2 Answers2

1

Even simpler:

s = pd.Series([0,1,2,3], index=pd.date_range('2014-01-01', periods=4), name='s')
df = pd.DataFrame(s)
print(df)

yields

            s
2014-01-01  0
2014-01-02  1
2014-01-03  2
2014-01-04  3
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Yeah, I agree that works, but I guess I should have expanded a little. The initial column name of myseries needs to be dropped, and the new column name in my dataframe must be the name myseries. This is because I will subsequently fill in other columns as I calculate them. And once you specify columns=[], you seem to have to also specify index to get a non-null dataframe. – user Mar 09 '15 at 05:44
  • 1
    This is a simpler way to do it, but the OPs question is about why `mydf = pd.DataFrame(myseries, columns=["myseries"], index = myseries.index)` doesn't work – Alex Mar 09 '15 at 15:23
0
s = pd.Series([0,1,2,3], index=pd.date_range('2014-01-01', periods=4))
df = pd.DataFrame(s, columns=['s'], index=s.index)
print(df)

yields

            s
2014-01-01  0
2014-01-02  1
2014-01-03  2
2014-01-04  3
Alex
  • 18,484
  • 8
  • 60
  • 80
  • This works fine for me too, but my own data doesn't, unless I append the ".values". It's a mystery to me why. – user Mar 09 '15 at 05:47
  • What are the `dtypes` of the index and the values? – Alex Mar 09 '15 at 15:25
  • "dtype" is not allowed but "type" is ``. Values are `` and indexes are ``. – user Mar 09 '15 at 15:53
  • And, if something seems strange to you about that index, please see a workaround for an earlier line of the code I posted [here](http://stackoverflow.com/questions/28910231/failing-to-convert-pandas-dataframe-timestamp). – user Mar 09 '15 at 16:02
  • Not sure how to help unless you post some reproducible code. – Alex Mar 09 '15 at 17:52
  • I understand. The thing is, I can't reproduce it with a sample case either. And I can't post 3 GB of data. This example above uses a series with all the same characteristics as mine, as far as I can tell, but results in different behavior when entering a dataframe. I'm stumped. – user Mar 09 '15 at 18:04
  • What do you mean dtype is not allowed? What are the results from `print(myseries.dtype, myseries.index.dtype)`? – Alex Mar 09 '15 at 18:07
  • Ah, I was trying `dtype(myseries.values)`. For your line, I get `(dtype('float64'), dtype(' – user Mar 09 '15 at 18:09