0

I have two dataframes. df1 has an index list made of strings like (row1,row2,..,rown) and a column list made of strings like (col1,col2,..,colm) while df2 has k rows and 3 columns (char_1,char_2,value). char_1 contains strings like df1 indexes while char_2 contains strings like df1 columns. I only want to assign the df2 value to df1 in the right position. For example if the first row of df2 reads ['row3','col1','value2'] I want to assign value2 to df1 in the position ([2,0]) (third row and first column).

I tried to use two functions to slide rows and columns of df1:

def func1(val):
    # first I convert the series to dataframe
    val=val.to_frame()
    val=val.reset_index() 
    val=val.set_index('index') # I set the index so that it's the right column
    def func2(val2):
        try: # maybe the combination doesn't exist
            idx1=list(cou.index[df2[char_2]==(val2.name)]) #val2.name reads col name of df1 
            idx2=list(cou.index[df2[char_1]==val2.index.values[0]]) #val2.index.values[0] reads index name of df1
            idx= list(reduce(set.intersection, map(set, [idx1,idx2])))
            idx=int(idx[0]) # final index of df2 where I need to take value to assign to df1
            check=1
        except:
            check=0
        if check==1:  # if index exists              
            val2[0]=df2['value'][idx] # assign value to df1
        return val2
    val=val.apply(func2,axis=1) #apply the function for columns
    val=val.squeeze() #convert again to series
    return val
df1=df1.apply(func1,axis=1) #apply the function for rows

I made the conversion inside func1 because without this step I wasn't able to work with series keeping index and column names so I wasn't able to find the index idx in func2. Well the problem is that it takes forever. df1 size is (3'600 X 20'000) and df2 is ( 500 X 3 ) so it's not too much. I really don't understand the problem.. I run the code for the first row and column to check the result and it's fine and it takes 1 second, but now for the entire process I've been waiting for hours and it's still not finished.
Is there a way to optimize it? As I wrote in the title I only need to run a function that keeps column and index names and works sliding the entire dataframe. Thanks in advance!

nico
  • 19
  • 4
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Jan 02 '23 at 20:08

0 Answers0