0

I have the following dataframe with two columns containing tuples, Col1 and Col3.

       Check       Col1       Col2     Col3
0       NaN       (F123, 1)     R   (F123, 2)
0       NaN       (F123, 1)     R   (F123, 6)
0       NaN       (F123, 1)     R   (F123, 7)

Using the dictionary below how can I iterate through the columns and compare the tuples to see which have the same dictionary value and then if they do output some text to the Check col? For instance, (F123, 1) and (F123, 6) have the same dictionary value, 'R'.

The Dictionary:

df1d = {('F123', 1): 'R', ('F123', 2): 'O', ('F123', 6): 'R',

The Code I was trying:

def check_color(dictionary, value, adj_values, df):#      for x in adj_values:

      if tuple(x) in [k for k, v in dictionary.items() if v == dictionary[value]]:

          df.set_value('Check', 'Bad')

          return


for index, row in df.iterrows():
  check_color(df1d, row['Col1'], row['Col3'], df)

The Desired outcome would be:

       Check       Col1       Col2     Col3
0       NaN       (F123, 1)     R   (F123, 2)
0       Bad       (F123, 1)     R   (F123, 6)
0       NaN       (F123, 1)     R   (F123, 7)

Also, what is the correct approach to filtering out all of the rows in the dataframe that dont have a match between Col1 and Col3?

What I tried:

df[(df['ConnectorAndPin'] == df['Adj.'])]
MaxB
  • 428
  • 1
  • 8
  • 24

2 Answers2

1

As far as I have understood,

data

   Check       Col1 Col2       Col3
0    NaN  (F123, 1)    R  (F123, 2)
1    NaN  (F123, 1)    R   (F123,6)
2    NaN  (F123, 1)    R  (F123, 7)

dictionary

df1d = {('F123', 1): 'R', ('F123', 2): 'O', ('F123', 6): 'R'}

To convert the strings in the df to tuples - check this Convert tuple-strings to tuple of strings

class FakeNamespace(dict):
    def __missing__(self, key):
        return key

data['Check'] = data[['Col1','Col3']].apply(lambda x: 'Bad' if df1d.get(eval(x.Col1, FakeNamespace())) == df1d.get(eval(x.Col3, FakeNamespace())) else np.nan, axis=1)

Output

  Check       Col1 Col2       Col3
0   NaN  (F123, 1)    R  (F123, 2)
1   Bad  (F123, 1)    R   (F123,6)
2   NaN  (F123, 1)    R  (F123, 7)

Also, what is the correct approach to filtering out all of the rows in the dataframe that dont have a match between Col1 and Col3?

data[data['Check'].isnull()]

  Check       Col1 Col2       Col3
0   NaN  (F123, 1)    R  (F123, 2)
2   NaN  (F123, 1)    R  (F123, 7)
iamklaus
  • 3,720
  • 2
  • 12
  • 21
1

Tuples in series isn't recommended. You are restricted to a Python-level loop. You can use a list comprehension and take advantage of the fact NaN != NaN:

zipper = zip(df['Col1'], df['Col3'])
df['Check'] = [df1d.get(x, np.nan) == df1d.get(y, np.nan) for x, y in zipper]
df['Check'] = np.where(df['Check'], 'Bad', np.nan)

print(df)

  Check       Col1 Col2       Col3
0   nan  (F123, 1)    R  (F123, 2)
1   Bad  (F123, 1)    R  (F123, 6)
2   nan  (F123, 1)    R  (F123, 7)

To filter for rows where Check is null, you can use pd.Series.isnull:

df_filtered = df[df['Check'].isnull()]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Freaking amazing, now it makes sense. Thanks a lot, I have been trying to learn a lot about data manipulation in python and pandas. This is a big help. – MaxB Jan 30 '19 at 16:43
  • last quick question, when trying to filter out the null rows my dataframe that I get back is completely empty. I have been trying out many different ways to filter the dataframe for various things and no matter what I try to filter it comes back empty. any ideas? – MaxB Jan 30 '19 at 16:54
  • @MaxB, Sorry, no idea. If `Check` is null for at least one row, then `df[df['Check'].isnull()]` will be non-empty. – jpp Jan 30 '19 at 16:56
  • ok understand, if I want to just make the null cells say "Good" would I say df['Check'] = np.where(df['Check'], 'Good') – MaxB Jan 30 '19 at 16:58
  • Just use `np.where(df['Check'], 'Bad', 'Good')`, it's just like an `if` / `else` statement, but vectorised. – jpp Jan 30 '19 at 16:59