3

I have a list of names and a numpy array as below, respectively. How could I combine these two to make a pandas DataFrame? (My actual problem is larger than this, as I have more than 700 column names and hundred thousand inputs in the array). Your help will be so invaluable to me. Thank you.

column_names = [u'Bars', u'Burgers', u'Dry Cleaning & Laundry', u'Eyewear & Opticians', u'Local Services', u'Restaurants', u'Shopping']

values = array([[1, 1, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 1, 0, 0],
   [0, 0, 0, 1, 0, 0, 1],
   [0, 0, 0, 0, 0, 1, 0]], dtype=int64)

UPDATE

Thank you very much for the quick inputs. I am sorry that I did not fully explain the final goal that I would like to achieve -- I would like to add another column score, which is a list [4, 4.5, 5, 5.5, 3], to the pandas data frame. Then I would like to extract all columns except of score as predictors to predict score in a linear regression model. I think the essential part here is how to add a new column in an efficient way? I know that I can do

data = pd.DataFrame({"Bars": Bars, "Burgers": Burgers, "Dry Clearning & Laundry": Dry Cleaning & Laundry, ..."score": score})

However, this seems very unlikely to do as I have way too many columns.

I also use dd = pd.DataFrame(values, columns=column_names), and ddd = pd.DataFrame(dd, scores).

This yields:

Out[185]: 
Bars  Burgers  Dry Cleaning & Laundry  Eyewear & Opticians  Local Services   \
3   0.0      0.0                     0.0                  0.0             0.0   
5   NaN      NaN                     NaN                  NaN             NaN   
5   NaN      NaN                     NaN                  NaN             NaN   
4   NaN      NaN                     NaN                  NaN             NaN   

Restaurants  Shopping  
3          1.0       0.0  
5          NaN       NaN  
5          NaN       NaN  
4          NaN       NaN`

Once again thank you very much!!

yearntolearn
  • 1,064
  • 2
  • 17
  • 36
  • Possible duplicate of [Creating a Pandas DataFrame with a numpy array containing multiple types](http://stackoverflow.com/questions/21647054/creating-a-pandas-dataframe-with-a-numpy-array-containing-multiple-types) – shivsn Jul 22 '16 at 15:33

2 Answers2

3
import pandas as pd
import numpy as np

column_names = [u'Bars', u'Burgers', u'Dry Cleaning & Laundry', u'Eyewear & Opticians', u'Local Services', u'Restaurants', u'Shopping']

values = array([[1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0]], dtype=int64)

df = pd.DataFrame(data=values, columns=column_names)

df.loc[:,'Scores'] = pd.Series(score, index=df.index)
James Russo
  • 578
  • 3
  • 18
0

I think I figured out. I can make scores another data frame. Then concatenate the first data frame dd = pd.DataFrame(values, columns=column_names) with the second data frame scores.

pd.concat([dd, scores], axis=1)

This can generate a new data frame.

yearntolearn
  • 1,064
  • 2
  • 17
  • 36