How to construct a list comprehension with nested for loops and conditionals for pandas?

Question

I'm having difficulty getting the following complex list comprehension to work as expected. It's a double nested for loop with conditionals.

Let me first explain what I'm doing:

import pandas as pd

dict1 = {'stringA':['ABCDBAABDCBD','BBXB'], 'stringB':['ABDCXXXBDDDD', 'AAAB'], 'num':[42, 13]}

df = pd.DataFrame(dict1)
print(df)
        stringA       stringB  num
0  ABCDBAABDCBD  ABDCXXXBDDDD   42
1          BBXB          AAAB   13

This DataFrame has two columns stringA and stringB with strings containing characters A, B, C, D, X. By definition, these two strings have the same length.

Based on these two columns, I create dictionaries such that stringA begins at index 0, and stringB begins at the index starting at num.

Here's the function I use:

def create_translation(x):
    x['translated_dictionary'] = {i: i +x['num'] for i, e in enumerate(x['stringA'])}
    return x

df2 = df.apply(create_translation, axis=1).groupby('stringA')['translated_dictionary']


df2.head()
0    {0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: ...
1                         {0: 13, 1: 14, 2: 15, 3: 16}
Name: translated_dictionary, dtype: object

print(df2.head()[0])
{0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df2.head()[1])
{0: 13, 1: 14, 2: 15, 3: 16}

That's correct.

However, there are 'X' characters in these strings. That requires a special rule: If X is in stringA, don't create a key-value pair in the dictionary. If X is in stringB, then the value should not be i + x['num'] but -500.

I tried the following list comprehension:

def try1(x):
    for count, element in enumerate(x['stringB']):
        x['translated_dictionary'] = {i: -500 if element == 'X' else  i + x['num'] for i, e in enumerate(x['stringA']) if e != 'X'}
    return x

That gives the wrong answer.

df3 = df.apply(try1, axis=1).groupby('stringA')['translated_dictionary']

print(df3.head()[0]) ## this is wrong!
{0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df3.head()[1])   ## this is correct! There is no key for 2:15!
{0: 13, 1: 14, 3: 16}

There are no -500 values!

The correct answer is:

print(df3.head()[0])
{0: 42, 1: 43, 2: 44, 3: 45, 4:-500, 5:-500, 6:-500, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}

print(df3.head()[1])
{0: 13, 1: 14, 3: 16}

Why does your last example have 13, 14, 16 instead of 13, 14, 15? — John Zwinck, Oct 06 '18 at 23:07
@JohnZwinck That's based on the first rule. "If X is in stringA, don't create a key-value pair in the dictionary." In this case, `BBXB` has an X at 2:15. Does this make sense? — ShanZhengYang, Oct 06 '18 at 23:24

score 1 · Accepted Answer · answered Oct 06 '18 at 23:40

1

Here's a simple way, without any comprehensions (because they aren't helping clarify the code):

def create_translation(x):
    out = {}
    num = x['num']
    for i, (a, b) in enumerate(zip(x['stringA'], x['stringB'])):
        if a == 'X':
            pass
        elif b == 'X':
            out[i] = -500
        else:
            out[i] = num
        num += 1
    x['translated_dictionary'] = out
    return x

answered Oct 06 '18 at 23:40

John Zwinck

239,568
38
324
436

This works! I hadn't thought about keeping tracking of `num` as a local variable and iterating. Thanks! – ShanZhengYang Oct 06 '18 at 23:47

BENY · Answer 2 · 2018-10-06T23:27:03.810

0

Why not flatten your df , you can check with this post and recreate the dict

n=df.stringA.str.len()
newdf=pd.DataFrame({'num':df.num.repeat(n),'stringA':sum(list(map(list,df.stringA)),[]),'stringB':sum(list(map(list,df.stringB)),[])})


newdf=newdf.loc[newdf.stringA!='X'].copy()# remove stringA value X
newdf['value']=newdf.groupby('num').cumcount()+newdf.num # using groupby create the cumcount 
newdf.loc[newdf.stringB=='X','value']=-500# assign -500 when stringB is X
[dict(zip(x.groupby('num').cumcount(),x['value']))for _,x in newdf.groupby('num')] # create the dict for different num by group
Out[390]: 
[{0: 13, 1: 14, 2: 15},
 {0: 42,
  1: 43,
  2: 44,
  3: 45,
  4: -500,
  5: -500,
  6: -500,
  7: 49,
  8: 50,
  9: 51,
  10: 52,
  11: 53}]

edited Oct 06 '18 at 23:27

answered Oct 06 '18 at 23:14

BENY

317,841
20
164
234

Could you explain a bit more what you're doing above? – ShanZhengYang Oct 06 '18 at 23:25
@ShanZhengYang check it , updated . Also for the flatten one , I did not explain here, you can check the linked post . – BENY Oct 06 '18 at 23:27

How to construct a list comprehension with nested for loops and conditionals for pandas?

2 Answers2