1

I want to combine "text" column with first DataFrame where B value is closest <= A value. DataFrames length is not equal.

example

a = np.array(range(10, 35, 5))
b = np.array(range(0, 30, 5)) + 2
b_text = [random.choice(string.ascii_letters) for i in range(len(b))]
df1 = pd.DataFrame(a, columns=['A'])
df2 = pd.DataFrame(list(zip(b, b_text)), columns=['B', 'text'])
anothernode
  • 5,100
  • 13
  • 43
  • 62
typae
  • 23
  • 3

1 Answers1

0

I think need merge_asof:

#if problem with different dtypes
#df1['A'] = df1['A'].astype(np.int64)
#df2['B'] = df2['B'].astype(np.int64)

df = pd.merge_asof(df1, df2, left_on='A', right_on='B')
print (df)
    A   B text
0  10   7    R
1  15  12    y
2  20  17    i
3  25  22    a
4  30  27    G
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Your solution did exactly what i asked, but my question was wrong. I need to use more than one column with different conditions and want to find more generic approach. Can you give me advice in which direction should i look in documentation? – typae Jul 23 '18 at 09:28
  • @typae - It depends of your functions, I think [`reindex`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reindex.html) with parameter `method` should help or maybe another solution should be create helper Series by `map`, something like [this](https://stackoverflow.com/a/51415070/2901002) solution. – jezrael Jul 23 '18 at 09:33