Given question with dataframe
in title, variables df1
and df2
together with col1
and col2
probably is related to pandas or numpy.
Without any further context provided, like code, we can only recommend vague options but not help with a specific solution.
Functions from Numpy, Pandas and Python built-in
Following are some functions in the solution space:
- element is in other collection:
numpy.in1d
(explained below), pandas.Series.isin
, set & other
or set.intersection()
- map boolean to string or character:
numpy.where
(explained below), pandas.Series.where
, map
Value in 1-D array (exists / present / duplicated)
See numpy's in1d(ar1, ar2, assume_unique=False, invert=False)
function:
Test whether each element of a 1-D array is also present in a second array.
import numpy as np
array_1 = np.array(['A', 'B', 'C'])
print(array_1)
# ['A' 'B' 'C']
array_1_elements_exist = np.in1d(array_1, ['C', 'D'])
print(array_1_elements_exist)
# [False False True]
Map to either X or Y (binary classification)
The mapping can be done using Python's built-in map(mapping_function, array_or_list)
as answered by rikyeah.
Or directly use numpy's where(condition, [x, y, ])
Return elements chosen from x or y depending on condition.
to map binary values (in statistics this is called binary classification):
import numpy as np
array_bool = np.array([True, False])
print(array_bool)
# array([ True, False])
array_str = np.where(array_bool, 'x', 'y')
print(array_bool)
# array(['x', 'y'], dtype='|S1')
Comparing two dataframes? (missing context for specific application)
As the question hasn't shown a reproducible example yet, it is unclear how the combined functionality can be applied in context.
Until some example is provided in given question, the combination of both functions is left open.
Example applications of these functions to pandas are:
Or in built-in Python: