1

Wondering if someone could shed some light on a for loop to perform the following.

df1

col1:
A
B
C
D
E

df2

col2:
A
C
D

If the value in df2 appear in df1, replace with X, else replace with Y and append a new column.

Final df1

col3:
X
Y
X
X
Y
Kraigolas
  • 5,121
  • 3
  • 12
  • 37
Wen.C
  • 11
  • 1
  • 2
    Using a `for` loop with a pandas dataframe is almost never a good idea. In this case, you can solve the problem with `np.where` but it might be good to practice yourself writing the for loop. Note that if you are stuck writing the for loop or get the wrong results, you can post your attempt here. The StackOverflow community is much more receptive to questions where there is effort shown in finding your own solution. – Kraigolas Jan 14 '22 at 16:28
  • Why a `for` loop? What did you try in pandas? Add your [example] and please also add specific tags, e.g. [tag:pandas]. – hc_dev Jan 14 '22 at 17:54

2 Answers2

4

As Kraigolas commented, you can easily do this without looping.

Check if elements are in another array with np.in1d and then map truth values to "X" and "Y":

import pandas as pd
import numpy as np


df1 = pd.DataFrame()
df1["col1"] = ["A", "B", "C", "D", "E"]
df2 = pd.DataFrame()
df2["col2"] = ["A", "C", "D"]

df1["col3"] = list(map(lambda x: "X" if x else "Y", np.in1d(df1.col1, df2.col2)))
print(df1)

Output:

  col1 col3
0    A    X
1    B    Y
2    C    X
3    D    X
4    E    Y
rikyeah
  • 1,896
  • 4
  • 11
  • 21
  • 1
    To better __express the sequence of steps__ we could either separate statements in numpy `step1 = np.in1d(..); step2 = map(..)` or benefit from [__chaining in pandas__](https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chaining-in-pandas-15c2109e3c69?gi=cf0b14dc80de): `result = pandas.Series.in1d(..).map(..)` using pandas' [`map()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) – hc_dev Jan 14 '22 at 19:26
1

Given question with dataframe in title, variables df1 and df2 together with col1 and col2 probably is related to or .

Without any further context provided, like code, we can only recommend vague options but not help with a specific solution.

Functions from Numpy, Pandas and Python built-in

Following are some functions in the solution space:

  1. element is in other collection: numpy.in1d (explained below), pandas.Series.isin, set & other or set.intersection()
  2. map boolean to string or character: numpy.where (explained below), pandas.Series.where, map

Value in 1-D array (exists / present / duplicated)

See numpy's in1d(ar1, ar2, assume_unique=False, invert=False) function:

Test whether each element of a 1-D array is also present in a second array.

import numpy as np

array_1 = np.array(['A', 'B', 'C'])
print(array_1)
# ['A' 'B' 'C']
array_1_elements_exist = np.in1d(array_1, ['C', 'D'])
print(array_1_elements_exist)
# [False False  True]

Map to either X or Y (binary classification)

The mapping can be done using Python's built-in map(mapping_function, array_or_list) as answered by rikyeah.

Or directly use numpy's where(condition, [x, y, ])

Return elements chosen from x or y depending on condition.

to map binary values (in statistics this is called binary classification):

import numpy as np

array_bool = np.array([True, False])
print(array_bool)
# array([ True, False])
array_str = np.where(array_bool, 'x', 'y')
print(array_bool)
# array(['x', 'y'], dtype='|S1')

Comparing two dataframes? (missing context for specific application)

As the question hasn't shown a reproducible example yet, it is unclear how the combined functionality can be applied in context.

Until some example is provided in given question, the combination of both functions is left open.

Example applications of these functions to pandas are:

Or in built-in Python:

hc_dev
  • 8,389
  • 1
  • 26
  • 38