1

I have two dataframes in python

First dataframe : tf_words : of shape (1 row,2235 columns) : looks like-

     0   1    2     3      4     5      6    ......  2234
0   aa, aaa, aaaa, aaan, aaanu, aada, aadhyam,.....zindabad]

Second dataframe : tf1_bigram: of shape (4000, 34319) : contains bigram with their occurrences in dataset, dataframe looks like-

(a, en) (a, ha) (a, padam) (aa, aala) (aa, accountinte) (aa,adhamanaya)...
  1        0         0         1            0                 0        ...
  0        1         0         0            1                 0        ...
  0        0         1         0            0                 1        ...

I have to compare tf_words dataframe with tf1_bigram dataframe and the comparison should be as follows

E.g. As seen in tf_words dataframe, though the word 'aa' is matching with only one word in columns: (aa, aala) (aa, accountinte) & (aa,adhamanaya) in tf1_bigram datagram, those matching columns values will be multiply by 0.5.

then to check for 'aaa', and if found multiply found column by 0.5;

then to check for 'aaaa', if found multiply found column by 0.5;

then for 'aaan', if found multiply the found column by 0.5

and so on upto last word 'zindabad'(having coulmn no. 2234)

Thus the output tf1_bigram will look like as below:

(a, en) (a, ha) (a, padam) (aa, aala) (aa, accountinte) (aa,adhamanaya)...
  1        0         0         0.5          0                 0        ...
  0        1         0         0            0.5               0        ...
  0        0         1         0            0                 0.5      ...

I have tried : tf1_bigram.apply(lambda x: np.multiply(x * 0.5) if x.name in tf_words else x) but output output is not what I have expected.

Plz help...!!!!!!!!

Prasad Joshi
  • 33
  • 1
  • 4
  • Hi Prasad, please follow these guidelines on how to write a minimum reproducible example, this will make it easier for people to understand and answer your question :) [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) –  May 31 '22 at 11:54
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway May 31 '22 at 11:54
  • Please provide enough code so others can better understand or reproduce the problem. – Community May 31 '22 at 12:34

1 Answers1

0

try this

import pandas as pd
table = {
    'a, en':[1,0,0],
    'a, ha':[0,1,0],
    'a, padam':[0,0,1],
    'aa, aala' :[1,0,0],
    'aaa, accountinte':[0,1,0],
    'aaaa,adhamanaya':[0,0,1],
    'aaab,adhamanaya':[0,0,1]
           }
tf1_bigram = pd.DataFrame(table)

table = {0:['aa'], 1:['aaa'], 2:['aaaa'], 3:['aaan'], 4:['aaanu'], 5:['aada'], 6:['aadhyam']}
tf_words  = pd.DataFrame(table)

list_tf_words = tf_words.values.tolist()

print(tf1_bigram)

print(f'\n\n-------------BREAK-----------\n\n')


def func(x):
    for y in list_tf_words[0]:
        if x.name.find(y) != -1:
            return x*0.5
        else:
            pass
    return x

tf1_bigram = tf1_bigram.apply(func, axis = 0) 

print(tf1_bigram)

OUTUPUT

   a, en  a, ha  a, padam  ...  aaa, accountinte  aaaa,adhamanaya  aaab,adhamanaya
0      1      0         0  ...                 0                0                0
1      0      1         0  ...                 1                0                0
2      0      0         1  ...                 0                1                1

[3 rows x 7 columns]


-------------BREAK-----------


   a, en  a, ha  a, padam  ...  aaa, accountinte  aaaa,adhamanaya  aaab,adhamanaya
0      1      0         0  ...               0.0              0.0              0.0
1      0      1         0  ...               0.5              0.0              0.0
2      0      0         1  ...               0.0              0.5              0.5

[3 rows x 7 columns]

If you want to multiply by 0.5 more than once, use this code below

import pandas as pd
table = {
    'a, en':[1,0,0],
    'a, ha':[0,1,0],
    'a, padam':[0,0,1],
    'aa, aala' :[1,0,0],
    'aaa, aaanu, accountinte':[0,1,0],
    'aaaa,adhamanaya':[0,0,1]
              }
tf1_bigram = pd.DataFrame(table)

table = {0:['aa'], 1:['aaa'], 2:['aaaa'], 3:['aaan'], 4:['aaanu'], 5:['aada'], 6:['aadhyam']}
tf_words  = pd.DataFrame(table)

list_tf_words = tf_words.values.tolist()

print(tf1_bigram)

print(f'\n\n-------------BREAK-----------\n\n')


def func(x):
    for y in list_tf_words[0]:
        if x.name.find(y) != -1:
            x = x*0.5
        else:
            pass
    return x

tf1_bigram = tf1_bigram.apply(func, axis = 0) 

print(tf1_bigram)

OUTUPUT

   a, en  a, ha  a, padam  aa, aala  aaa, aaanu, accountinte  aaaa,adhamanaya
0      1      0         0         1                        0                0
1      0      1         0         0                        1                0
2      0      0         1         0                        0                1


-------------BREAK-----------


   a, en  a, ha  a, padam  aa, aala  aaa, aaanu, accountinte  aaaa,adhamanaya
0      1      0         0       0.5                   0.0000            0.000
1      0      1         0       0.0                   0.0625            0.000
2      0      0         1       0.0                   0.0000            0.125

try this, if you need compare exactly content the column with tf_words

import pandas as pd
table = {
    'a, en':[1,0,0],
    'a, ha':[0,1,0],
    'a, padam':[0,0,1],
    'aa, aala' :[1,0,0],
    'aaa, accountinte':[0,1,0],
    'aaaa,adhamanaya':[0,0,1],
    'aaab,adhamanaya':[0,0,1]
           }
tf1_bigram = pd.DataFrame(table)

table = {0:['a'], 1:['en'], 2:['aaaa'], 3:['aaan'], 4:['aaanu'], 5:['aada'], 6:['aadhyam']}
tf_words  = pd.DataFrame(table)

list_tf_words = tf_words.values.tolist()

print(tf1_bigram)

print(f'\n\n-------------BREAK-----------\n\n')


def func(x):
    temp = x.name.split(',')
    for y in list_tf_words[0]: 
        if (temp[0].strip()) in list_tf_words[0] and (temp[1].strip()) in list_tf_words[0]: # change "and" condition case only one value need match with the list of tf_words 
            return x*0.5
        else:
            return x

tf1_bigram = tf1_bigram.apply(func, axis = 0) 

print(tf1_bigram)

OUTUPUT

   a, en  a, ha  a, padam  ...  aaa, accountinte  aaaa,adhamanaya  aaab,adhamanaya
0      1      0         0  ...                 0                0                0
1      0      1         0  ...                 1                0                0
2      0      0         1  ...                 0                1                1

[3 rows x 7 columns]


-------------BREAK-----------


   a, en  a, ha  a, padam  ...  aaa, accountinte  aaaa,adhamanaya  aaab,adhamanaya
0    0.5      0         0  ...                 0                0                0
1    0.0      1         0  ...                 1                0                0
2    0.0      0         1  ...                 0                1                1

[3 rows x 7 columns]

Solution for Tuples:

import pandas as pd
table = {
    ('a', 'en'):(1,0,0),
    ('a', 'ha'):[0,1,0],
    ('a', 'padam'):[0,0,1],
    ('aa', 'aala') :[1,0,0],
    ('aaa', 'accountinte'):[0,1,0],
    ('aaaa','adhamanaya'):[0,0,1],
    ('aaab','adhamanaya'):[0,0,1]
           }
tf1_bigram = pd.DataFrame(table)

table = {0:['a'], 1:['en'], 2:['aaaa'], 3:['aaan'], 4:['aaanu'], 5:['aada'], 6:['aadhyam']}
tf_words  = pd.DataFrame(table)

list_tf_words = tf_words.values.tolist()

print(tf1_bigram)

print(f'\n\n-------------BREAK-----------\n\n')


def func(x):
    temp = x.name
    if (temp[0].strip()) in list_tf_words[0] and (temp[1].strip()) in list_tf_words[0]: # change "and" condition case only one value need match with the list of tf_words 
        return x*0.5
    else:
        return x
tf1_bigram = tf1_bigram.apply(func, axis = 0) 

print(tf1_bigram)

OUTUPUT

   a            aa         aaa       aaaa       aaab
  en ha padam aala accountinte adhamanaya adhamanaya
0  1  0     0    1           0          0          0
1  0  1     0    0           1          0          0
2  0  0     1    0           0          1          1


-------------BREAK-----------


     a            aa         aaa       aaaa       aaab
    en ha padam aala accountinte adhamanaya adhamanaya
0  0.5  0     0    1           0          0          0
1  0.0  1     0    0           1          0          0
2  0.0  0     1    0           0          1          1
Rafael MR
  • 193
  • 1
  • 15
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/245532/discussion-on-answer-by-rafael-mr-compare-two-dataframes-with-different-shapes-a). – Machavity Jun 12 '22 at 16:52