assign value to new column [Python pandas]

Question

I have a scenario where I am running two functions in a script:

test.py :

def func1():
    df1=pd.read_csv('test1.csv')
    val1=df['col1'].mean().round(2)
    return va11

def func2():
    df2=pd.read_csv('test2.csv')
    val2=df['col1'].mean().round(2)
    return val2

def func3():
    dataf = pd.read_csv('test3.csv')
    col1=dataf['area']
    col2 = dataf['overall']
    dataf['overall']=val1 # value from val1 ->leads to error
    dataf['overall']=val2 #value from val2 ->leads to error

Here I am reading test1.csv & test2.csv file and I am storing the mean value in variable "val1" & "val2" respectively and returning the same. These variable values I want to store in a new test3.csv file which is having two cols and values should be stored one after one(appending). By the above it is not working out & couldn't find anything on internet as such. Any help would be great.

Oh my... are you aware of how to work with functions? You should read a tutorial on functions and return values first. — cs95, Nov 07 '17 at 08:14

jezrael · Accepted Answer · 2017-11-07T19:25:20.983

2

You need pass variables as parameters in function func3, and if only difference in func1 and func2 is file name, create only one function with parameetr .

Thanks for idea cᴏʟᴅsᴘᴇᴇᴅ ;)

def func1(file):
    df=pd.read_csv(file)
    val=df['col1'].mean().round(2)
    return val

a = func1('test1.csv')
b = func1('test2.csv')

def func3(val1=a, val2=b):
    dataf = pd.read_csv('test3.csv')
    col1=dataf['area']
    col2 = dataf['overall']
    dataf.iloc[::2, dataf.columns.get_loc('overall')] = val1 
    dataf.iloc[1::2, dataf.columns.get_loc('overall')] = val2
    return dataf

Sample:

dataf = pd.DataFrame({'overall':[1,7,8,9,4],
                      'col':list('abcde')})

print (dataf)
  col  overall
0   a        1
1   b        7
2   c        8
3   d        9
4   e        4

val1 = 20
val2 = 50

dataf.iloc[::2, dataf.columns.get_loc('overall')] = val1 
dataf.iloc[1::2, dataf.columns.get_loc('overall')] = val2
print (dataf)
  col  overall
0   a       20
1   b       50
2   c       20
3   d       50
4   e       20

General solution for append N values from list - create array by numpy.tile and then assign to new column:

val =[1,8,4]
a = np.tile(val, int(len(dataf) / len(val))+2)[:len(dataf)]
dataf['overall'] = a
print (dataf)
  col  overall
0   a        1
1   b        8
2   c        4
3   d        1
4   e        8

edited Nov 07 '17 at 19:25

answered Nov 07 '17 at 08:17

jezrael

822,522
95
1,334
1,252

@cᴏʟᴅsᴘᴇᴇᴅ - Thank you, now it is better ? – jezrael Nov 07 '17 at 08:42
1

Yeah it looks nice now. – cs95 Nov 07 '17 at 08:42
Thanks forthe response But this is overwirtng the values in cells rather appending it – Alex Nov 07 '17 at 09:05
I think the best is create 2 columns - one for `val1` and another for `val2`. Or what is your desired output in new column `val1, val2` or `val1 + val2` per row? – jezrael Nov 07 '17 at 09:07
according to my case val1 & val2 are different values which i need to place one after another. for ex:- row 1 --> val1 row2 --> val2 each time a function returns a value I should store it in new CSV. – Alex Nov 07 '17 at 10:28
Also I dont see with the above code that it doesn't overwrite anything when I run the file second time. – Alex Nov 07 '17 at 12:45
Prints nothing in "overall" coloumn – Alex Nov 07 '17 at 13:10
So if test print (a) it working correctly? If yes, there has to be another problem. – jezrael Nov 07 '17 at 13:11
checked the same, I got value returned from func1 and yes print a gives the value. But in col2 I dont see the value of "a" returned from func1 – Alex Nov 07 '17 at 13:14
Hmmm, `col2 = dataf['overall']` means get original column `overall` to `col2` variable. Do you need something else? – jezrael Nov 07 '17 at 13:18
One question If I dont know my number of functions as it can grow over time. so will this loation be helpful? cant't it be automatically appended to next row ? As I see u have defined the location of cell – Alex Nov 07 '17 at 13:20
Hmmm, `dataf.columns.get_loc('overall')` means get location of column `overall`, because select by position rows with [`DataFrame.iloc`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html) – jezrael Nov 07 '17 at 13:26
And for multiple parameetrs solutions check [this](https://stackoverflow.com/q/36901/2901002) - there is more possible solutions. – jezrael Nov 07 '17 at 13:33
No what I meant to say here is suppose I have one more func returning value "val3" Now what I see from your code is you have selected the location in code as "loc ::2 & loc 1::2 So If I have a 3rd val then do I need to write it explicitly ? Cant it be added automatically depending on number of functions? – Alex Nov 07 '17 at 13:56

assign value to new column [Python pandas]

1 Answers1