1

I have a dataframe that looks as follows:

ID    Name
1     Missing
2     Missing
3     Missing
.......

Is there a way in which I could fill in the Column Name equally(+1) if len(df) is uneven with a number of names I have stored( a list or a dictionary). For Ex if I have 2 names. Half of the column would be Name1 while the other half would be Name2. I tried:

for i in (range(len(df)/no_names)):
    counter=0
    df.ix[i]['Name'] = dictionary.values()[0]

but this would fill in only my first N rows based on how many names I have.

Andrei Cozma
  • 950
  • 3
  • 9
  • 14

2 Answers2

2

You could use

import numpy as np
N = len(df)
df['Name'] = np.array(['Name1', 'Name2'])[np.linspace(0,2,N,endpoint=False).astype(int)]

The idea here is to create an array of 0's and 1's, such as

In [34]: np.linspace(0,2,11,endpoint=False).astype(int)
Out[34]: array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

Now we can use NumPy indexing to create an array of 'Name1' and 'Name2' values:

In [8]: np.array(['Name1', 'Name2'])[np.linspace(0,2,11,endpoint=False).astype(int)]
Out[8]: 
array(['Name1', 'Name1', 'Name1', 'Name1', 'Name1', 'Name1', 'Name2',
       'Name2', 'Name2', 'Name2', 'Name2'], 
      dtype='<U5')
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • That dear Sir is phenomenal. – Andrei Cozma Nov 30 '16 at 11:57
  • Using your method unutbu i do have a result but when printing the dataframe I get an error:A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead Adding iloc doens't solve it, on the contrary there is no result to my print – Andrei Cozma Nov 30 '16 at 13:14
  • This warning -- I believe it is a UserWarning, not an Exception -- is saying that `df` is a *copy* of a slice of another DataFrame. The warning is there, out of an abundance of caution, to alert you that modifying `df` may not affect the original DataFrame. If that is not your intention, you may ignore the UserWarning. See http://stackoverflow.com/q/40033471/190597 for more information and ways to silence the UserWarning. – unutbu Nov 30 '16 at 13:20
1

my first try at python questions, This is definitely not the most efficient solution.

import pandas as pd
df = pd.DataFrame({'a':[1,4,4,0,4,0,4,0],'b':[2,1,4,0,4,0,4,0]})
#df
#Out[76]: 
#   a  b
#0  1  2
#1  4  1
#2  3  3
#3  4  4
#4  0  0
#5  4  4
#6  0  0
#7  4  4
#8  0  0

based on the length of each column, repeat Name1 and Name2 accordingly

df['new'] = np.repeat(np.array(["A", "B"]), repeats=[round(df.shape[0]/2), df.shape[0]-round(df.shape[0]/2)])

#Out[81]: 
#   a  b new
#0  1  2   A
#1  4  1   A
#2  3  3   A
#3  4  4   A
#4  0  0   B
#5  4  4   B
#6  0  0   B
#7  4  4   B
#8  0  0   B
joel.wilson
  • 8,243
  • 5
  • 28
  • 48