I am working on a project that will perform an audit of employees with computer accounts. I want to print one data frame with the two new columns in it. This is different from the Comparing Columns in Dataframes question because I am working with strings. I will also need to do some fuzzy logic but that is further down the line.
The data I receive is in Excel sheets. It comes from two sources that I don't have control over and so I format them to be [First Name, Last Name] and print them to the console to ensure the data I am working with is correct. I convert the .xls to .csv files, format the information and am able to output the two lists of names in a single dataframe with two columns but have not been able to put the values I want in the last two columns. I have used query (which returned True/False, not the names), diff and regex. I assume that I am just using the tools incorrectly.
import pandas as pd
nd = {'col1': ["Abraham Hansen","Demetrius McMahon","Hilary
Emerson","Amelia H. Hayden","Abraham Oliver"],
'col2': ["Abraham Hansen","Abe Oliver","Hillary Emerson","DJ
McMahon","Amelia H. Hayden"]}
info = pd.DataFrame(data=nd)
for row in info:
if info.col1.value not in info.col2:
info["Need Account"] = info.col1.value
if info.col2.value not in info.col1:
info["Delete Account"] = info.col2.value
print(info)
What I would like is a new dataframe with 2 columns: Need Account and Delete Account and fill in the appropriate values based on the other columns in the dataframe. In this case, I am getting an error that 'Series' has not attribute 'value'. Here is an example of my expected output:
df_out:
Need Account Delete Account
Demetrius McMahon Abe Oliver
Abraham Oliver Hillary Emerson
Hilary Emerson DJ McMahon
From this list I can look to see who's nickname showed up and pare the list down from there.