Adding list with different length as a new column to a dataframe

Question

I am willing to add or insert the list values in the dataframe. The dataframe len is 49, whereas the length of list id 47. I am getting the following error while implementing the code.

print("Lenght of dataframe: ",datasetTest.open.count())
print("Lenght of array: ",len(test_pred_list))
datasetTest['predict_close'] = test_pred_list

The error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-105-68114a4e9a82> in <module>()
      5 # datasetTest = datasetTest.dropna()
      6 # print(datasetTest.count())
----> 7 datasetTest['predict_close'] = test_pred_list
      8 # test_shifted['color_predicted'] = test_shifted.apply(determinePredictedcolor, axis=1)
      9 # test_shifted['color_original'] =

c:\python35\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   2517         else:
   2518             # set column
-> 2519             self._set_item(key, value)
   2520 
   2521     def _setitem_slice(self, key, value):

c:\python35\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   2583 
   2584         self._ensure_valid_index(value)
-> 2585         value = self._sanitize_column(key, value)
   2586         NDFrame._set_item(self, key, value)
   2587 

c:\python35\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   2758 
   2759             # turn me into an ndarray
-> 2760             value = _sanitize_index(value, self.index, copy=False)
   2761             if not isinstance(value, (np.ndarray, Index)):
   2762                 if isinstance(value, list) and len(value) > 0:

c:\python35\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
   3119 
   3120     if len(data) != len(index):
-> 3121         raise ValueError('Length of values does not match length of ' 'index')
   3122 
   3123     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

How I can get rid of this error. Please help me.

Well, what do you want for the last 2 values? E.g. you can try `datasetTest['predict_close'] = test_pred_list + [0, 0]`. — jpp, Jul 19 '18 at 13:54
I want to have a concrete solution as sometimes there are 2 less sometimes 3, And I want that it should get filled with the corresponding `open` value from dataframe. — Jaffer Wilson, Jul 19 '18 at 13:57

EdChum · Accepted Answer · 2018-07-19T14:01:26.843

30

If you convert the list to a Series then it will just work:

datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)

example:

In[121]:
df = pd.DataFrame({'a':np.arange(3)})
df

Out[121]: 
   a
0  0
1  1
2  2

In[122]:
df.loc[:,'b'] = pd.Series(['a','b'])
df

Out[122]: 
   a    b
0  0    a
1  1    b
2  2  NaN

The docs refer to this as setting with enlargement which talks about adding or expanding but it also works where the length is less than the pre-existing index.

To handle where the index doesn't start at 0 or in fact is not an int:

In[126]:
df = pd.DataFrame({'a':np.arange(3)}, index=np.arange(3,6))
df

Out[126]: 
   a
3  0
4  1
5  2

In[127]:
s = pd.Series(['a','b'])
s.index = df.index[:len(s)]
s

Out[127]: 
3    a
4    b
dtype: object

In[128]:
df.loc[:,'b'] = s
df

Out[128]: 
   a    b
3  0    a
4  1    b
5  2  NaN

You can optionally replace the NaN if you wish calling fillna

edited Jul 19 '18 at 14:01

answered Jul 19 '18 at 13:56

EdChum

376,765
198
813
562

4

Only concern is the original df's index may not start from 0 – BENY Jul 19 '18 at 13:58
2

@Wen true but that can be overcome easily – EdChum Jul 19 '18 at 13:58
Thank you for your reply sir.... It is another excellent answers that I have got today.... What a day....:) – Jaffer Wilson Jul 19 '18 at 13:59
Good one. very useful. – Jalil Nourmohammadi Khiarak Mar 28 '23 at 10:13

score 5 · Answer 2 · answered Jul 19 '18 at 14:00

5

You can add items to your list with an arbitrary filler scalar.

Data from @EdChum.

filler = 0
lst = ['a', 'b']

df.loc[:, 'b'] = lst + [filler]*(len(df.index) - len(lst))

print(df)

   a  b
0  0  a
1  1  b
2  2  0

answered Jul 19 '18 at 14:00

jpp

159,742
34
281
339

BENY · Answer 3 · 2018-07-19T14:13:32.643

5

You still can assign it by using loc data from Ed

l = ['a','b']
df.loc[range(len(l)),'b'] = l
df
Out[546]: 
   a    b
0  0    a
1  1    b
2  2  NaN

edited Jul 19 '18 at 14:13

answered Jul 19 '18 at 14:09

BENY

317,841
20
164
234

1

This is a nice answer if you need `NaN` filler. – jpp Jul 19 '18 at 14:12
1

Nice compact answer, don't know if explicitly passing a `pd.RangeIndex` would be more performant or not: `df.loc[pd.RangeIndex(df.index[0], len(l)),'b'] = l` +1 – EdChum Jul 19 '18 at 14:22
@EdChum yes you are right , this is more neat than the current one :-) thank you man ! – BENY Jul 19 '18 at 14:24

Adding list with different length as a new column to a dataframe

3 Answers3

Linked

Related