1

This question is probably very simple, but I seem to be having trouble creating a new column in a dataframe and filling that column with a numpy array. I have an array i.e. [0,0,0,1,0,1,1] and a dataframe that has the same number of rows as the length of that array. I want to add a column and I have been doing this:

df['new_col'] = array

however I get the following warning error:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I tried to do df.loc[:,'new_col'] = array but get the same warning error. I also tried:

df.loc['new_col'] = pd.Series(array, index = df.index)

based on a different answer from a question a different user asked. Does anyone know a "better" way to code this? Or should I just ignore the warning messages?

cp0515
  • 25
  • 1
  • 4
  • This error message can be extremely misleading. You have at some point unsafely subset your dataframe (made an implicit copy). You've done something like `new_df = df[some subset]` which is a copy, this is letting you know that you will not be modifying `df` (which does not seem to be what you are looking to do. There are many ways to avoid this but the most common is to make an _explicit_ copy like `new_df = df[some subset].copy()` or `df = df.copy()` before trying to set the value. – Henry Ecker Sep 13 '21 at 18:33

2 Answers2

0

Code from https://www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/

Import pandas package

import pandas as pd

Define a dictionary containing data

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2]
        }
  

Convert the dictionary into DataFrame

original_df = pd.DataFrame(data)

Using 'Qualification' as the column name and equating it to the list

altered_df = original_df.assign(Qualification = ['Msc', 'MA', 'Msc', 'Msc'])

Observe the result

altered_df
Dev Nocz
  • 35
  • 3
0

The DataFrame expects a list input (Each column is like a dictionary with columns as keys and a list as values)

Try this using the tolist() method on the numpy array:

df['new_col'] = array.tolist()