I have a list of names and a numpy array as below, respectively. How could I combine these two to make a pandas DataFrame? (My actual problem is larger than this, as I have more than 700 column names and hundred thousand inputs in the array). Your help will be so invaluable to me. Thank you.
column_names = [u'Bars', u'Burgers', u'Dry Cleaning & Laundry', u'Eyewear & Opticians', u'Local Services', u'Restaurants', u'Shopping']
values = array([[1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0]], dtype=int64)
UPDATE
Thank you very much for the quick inputs. I am sorry that I did not fully explain the final goal that I would like to achieve -- I would like to add another column score
, which is a list [4, 4.5, 5, 5.5, 3]
, to the pandas data frame. Then I would like to extract all columns except of score
as predictors to predict score
in a linear regression model. I think the essential part here is how to add a new column in an efficient way? I know that I can do
data = pd.DataFrame({"Bars": Bars, "Burgers": Burgers, "Dry Clearning & Laundry": Dry Cleaning & Laundry, ..."score": score})
However, this seems very unlikely to do as I have way too many columns.
I also use dd = pd.DataFrame(values, columns=column_names)
, and ddd = pd.DataFrame(dd, scores)
.
This yields:
Out[185]:
Bars Burgers Dry Cleaning & Laundry Eyewear & Opticians Local Services \
3 0.0 0.0 0.0 0.0 0.0
5 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
Restaurants Shopping
3 1.0 0.0
5 NaN NaN
5 NaN NaN
4 NaN NaN`
Once again thank you very much!!