1

Using: Python 2.7 and Pandas 0.11.0 on Mac OSX Lion

I'm trying to create an empty DataFrame and then populate it from another dataframe, based on a for loop.

I have found that when I construct the DataFrame and then use the for loop as follows:

data = pd.DataFrame()
for item in cols_to_keep:
    if item not in dummies:
        data = data.join(df[item])

Results in an empty DataFrame, but with the headers of the appropriate columns to be added from the other DataFrame.

DMML
  • 1,422
  • 4
  • 22
  • 39

2 Answers2

5

That's because you are using join incorrectly.

You can use a list comprehension to restrict the DataFrame to the columns you want:

df[[col for col in cols_to_keep if col not in dummies]]
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
2

What about just creating a new frame based off of the columns you know you want to keep, instead of creating an empty one first?

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':np.random.randn(5),
                    'b':np.random.randn(5),
                    'c':np.random.randn(5),
                    'd':np.random.randn(5)})
cols_to_keep = ['a', 'c', 'd']
dummies = ['d']
not_dummies = [x for x in cols_to_keep if x not in dummies]
data = df[not_dummies]
data

          a         c
0  2.288460  0.698057
1  0.097110 -0.110896
2  1.075598 -0.632659
3 -0.120013 -2.185709
4 -0.099343  1.627839
Greg Reda
  • 1,744
  • 2
  • 13
  • 20