1

I have this dataframe named "test" and a list of words list_w = ['monthly', 'moon']. I want to add a new column "revised cosine" such that: for each word present in list_w with 'weak' condition with cosine == 'Na', the revised cosine of their corresponding condition 'unrel_weak' will also be 'Na', similarly for each word present in list_w with 'strong' condition with cosine == 'Na', their revised cosine of their corresponding condition 'unrel_strong' will also be 'Na'

     isi       prime   target     condition  meanRT cosine 
0     50      weekly  monthly        strong   676.2    0.9
1   1050      weekly  monthly        strong   643.5    0.9
2     50       daily  monthly          weak   737.2     Na
3   1050       daily  monthly          weak   670.6     Na
4     50     bathtub  monthly  unrel_strong   692.2    0.1
5   1050     bathtub  monthly  unrel_strong   719.1    0.1
6     50      sponge  monthly    unrel_weak   805.8    0.3
7   1050      sponge  monthly    unrel_weak   685.7    0.3
8     50    crescent     moon        strong   625.0     Na
9   1050    crescent     moon        strong   537.2     Na
10    50      sunset     moon          weak   698.4    0.2
11  1050      sunset     moon          weak   704.3    0.2
12    50    premises     moon  unrel_strong   779.2    0.7
13  1050    premises     moon  unrel_strong   647.6    0.7
14    50     descent     moon    unrel_weak   686.0    0.5
15  1050     descent     moon    unrel_weak   725.4    0.5

My code is as below:

for w in list_w:
    if test.loc[(test['target']==w) & (test['condition']=='strong'), 'cosine']=='Na':
        test.loc[(test['target']==w) & (test['condition']=='unrel_strong'), 'cosine'] ='Na'

My code returns error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My expected output should be like in the dataframe below (with "revised cosine" column added)

     isi       prime   target     condition  meanRT cosine revised cosine
0     50      weekly  monthly        strong   676.2    0.9  0.9
1   1050      weekly  monthly        strong   643.5    0.9  0.9
2     50       daily  monthly          weak   737.2     Na  Na
3   1050       daily  monthly          weak   670.6     Na  Na
4     50     bathtub  monthly  unrel_strong   692.2    0.1  0.1
5   1050     bathtub  monthly  unrel_strong   719.1    0.1  0.1
6     50      sponge  monthly    unrel_weak   805.8    0.3  Na
7   1050      sponge  monthly    unrel_weak   685.7    0.3  Na
8     50    crescent     moon        strong   625.0     Na  Na
9   1050    crescent     moon        strong   537.2     Na  Na
10    50      sunset     moon          weak   698.4    0.2  0.2
11  1050      sunset     moon          weak   704.3    0.2  0.2
12    50    premises     moon  unrel_strong   779.2    0.7  Na
13  1050    premises     moon  unrel_strong   647.6    0.7  Na
14    50     descent     moon    unrel_weak   686.0    0.5  0.5
15  1050     descent     moon    unrel_weak   725.4    0.5  0.5

any ideas to help me out? I checked logical_and but they seem to work only with 2 conditions. Overwritting the cosine colum is also fine, as long as the output is like the revised cosine. Thanks in advance!

Jasmine N
  • 91
  • 7
  • Welcome to Stackoverflow. Please [don't post images of data](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) instead add them in text format so that we could be able to copy these while trying to answer your question. Please take some time to read [How to ask good pandas questions?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Shubham Sharma May 08 '21 at 14:59

2 Answers2

0

This error message comes from the fact that you can't do if statements like this with Pandas.

Try like this:

for w in list_w:
    for c in ["weak", "strong"]:
        mask = (
            (test["target"] == w) & (test["condition"] == c) & (test["cosine"] == "Na")
        )
        test.loc[mask, "revised cosine"] = "Na"
Laurent
  • 12,287
  • 7
  • 21
  • 37
0

Solution

m = test['cosine'].eq('Na') & \
    test['target'].isin(list_w) & \
    test['condition'].isin(['weak', 'strong'])

i1 = test.set_index(['isi', 'target', 'condition']).index
i2 = test[m].set_index(['isi', 'target', test.loc[m, 'condition'].radd('unrel_')]).index

test['revised_cosine'] = test['cosine'].mask(i1.isin(i2), 'Na')

Explanations

Let us create a boolean mask m which holds True when the cosine column contains Na and at the same time target column contains one of the word from list_w and condition column is either weak or strong

>>> m

0     False
1     False
2      True
3      True
4     False
5     False
6     False
7     False
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
dtype: bool

Create a MultiIndex based on the columns isi, target and condition, lets call it i1. Filter the rows in test dataframe using mask m, add a prefix unrel_ to the filtered rows in condition column and create another MultiIndex i2 in the similar way.

>>> i1
MultiIndex([(  50, 'monthly',       'strong'),
            (1050, 'monthly',       'strong'),
            (  50, 'monthly',         'weak'),
            (1050, 'monthly',         'weak'),
            (  50, 'monthly', 'unrel_strong'),
            (1050, 'monthly', 'unrel_strong'),
            (  50, 'monthly',   'unrel_weak'),
            (1050, 'monthly',   'unrel_weak'),
            (  50,    'moon',       'strong'),
            (1050,    'moon',       'strong'),
            (  50,    'moon',         'weak'),
            (1050,    'moon',         'weak'),
            (  50,    'moon', 'unrel_strong'),
            (1050,    'moon', 'unrel_strong'),
            (  50,    'moon',   'unrel_weak'),
            (1050,    'moon',   'unrel_weak')],
           names=['isi', 'target', 'condition'])

>>> i2
MultiIndex([(  50, 'monthly',   'unrel_weak'),
            (1050, 'monthly',   'unrel_weak'),
            (  50,    'moon', 'unrel_strong'),
            (1050,    'moon', 'unrel_strong')],
           names=['isi', 'target', 'condition'])

Mask the values in cosine column using a boolean mask which can be created by testing the membership of i1 in i2

     isi     prime   target     condition  meanRT cosine revised_cosine
0     50    weekly  monthly        strong   676.2    0.9            0.9
1   1050    weekly  monthly        strong   643.5    0.9            0.9
2     50     daily  monthly          weak   737.2     Na             Na
3   1050     daily  monthly          weak   670.6     Na             Na
4     50   bathtub  monthly  unrel_strong   692.2    0.1            0.1
5   1050   bathtub  monthly  unrel_strong   719.1    0.1            0.1
6     50    sponge  monthly    unrel_weak   805.8    0.3             Na
7   1050    sponge  monthly    unrel_weak   685.7    0.3             Na
8     50  crescent     moon        strong   625.0     Na             Na
9   1050  crescent     moon        strong   537.2     Na             Na
10    50    sunset     moon          weak   698.4    0.2            0.2
11  1050    sunset     moon          weak   704.3    0.2            0.2
12    50  premises     moon  unrel_strong   779.2    0.7             Na
13  1050  premises     moon  unrel_strong   647.6    0.7             Na
14    50   descent     moon    unrel_weak   686.0    0.5            0.5
15  1050   descent     moon    unrel_weak   725.4    0.5            0.5
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53