0

I need to match contacts in a database by how they were contacted by a unique ID number. I've created a very small mock dataframe below to help with a suggestion:

data = [
    ["email", "emailperson1@email.com", 1],
    ["phone", "555-555-1111", 1],
    ["slack", "secondpersonslack", 2],
    ["phone", "111-111-1111", 3],
]

data2 = [
    [1, "emailperson1@email.com", "555-555-1111", "slack1"],
    [2, "emailperson2@gmail.com", "555-555-2222", "secondpersonslack"],
    [3, "tomasconticello@gmail.com", "111-111-1111", "tomasslack"],
]

stackdata = pd.DataFrame(
    data, columns=["contact method", "from:", "column that I dont know how to make"]
)
stackdata2 = pd.DataFrame(data2, columns=["id", "email", "phone", "slack"])

In my real dataset, what I want to fill out is the column 'column that I dont know how to make'.

So, take first contact method in stackdata in row one, which was an email address. This email lines up with ID 1 in stack2, so it would populate the column in the stackdata1 with "1".

I was thinking of some of kind of for loop like:

for i in stackdata['column that I dont know how to make']:
  if i matches one of the columns in any row, then populate column with id of that row

Thanks for any help and let me know if i can make this question easier to answer!

LiuXiMin
  • 1,225
  • 8
  • 17
tom
  • 977
  • 3
  • 14
  • 30

2 Answers2

1

One way is to melt the contact-method columns in the second DataFrame into a single column, and merge the result with the first DataFrame. By default, pandas merges on any and all same-named columns in left and right. (From the docs: "If on is not passed and left_index and right_index are False, the intersection of the columns in the DataFrames and/or Series will be inferred to be the join keys.")

# Choose desired columns from your example data
df = stackdata[['contact method', 'from:']].copy()  

# Melt or "unpivot" the contact methods, renaming
# the resulting variable and value columns to match
# column names in stackdata
melted = pd.melt(stackdata2, 
                 id_vars='id', 
                 value_vars=['email', 'phone', 'slack'], 
                 value_name='from:', var_name='contact method')

# Merge with default arguments
pd.merge(df, melted)
  contact method                   from:  id
0          email  emailperson1@email.com   1
1          phone            555-555-1111   1
2          slack       secondpersonslack   2
3          phone            111-111-1111   3
Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
  • thanks for answer! but part of the issue with that is that each contact method isn't always labeled accurately. like a slack contact might be "message". so melting would miss a lot of think. maybe if we could match with any similar column. how would that be done? – tom Jun 19 '19 at 01:12
  • hmm... no out-of-the-box way that I know of, though this looks interesting: https://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas – Peter Leimbigler Jun 19 '19 at 01:25
1

Try this:

for i in range(len(stackdata)):
    stackdata['contact_id'][i] = str(stackdata2[(stackdata2['email'] == stackdata['from:'][i]) | (stackdata2['phone'] == stackdata['from:'][i]) | (stackdata2['slack'] == stackdata['from:'][i])]['id'].values).strip('[]')

'contact_id' is the column 'column that I dont know how to make'

DINA TAKLIT
  • 7,074
  • 10
  • 69
  • 74