0

I have a excel sheet containing only "Employee id" in Column A as shown below.

1677
5597
5623
5618

I have one more excel sheet containing "Employee details" of 10000+ Employees. For Eg: The Employee details excel sheet contains contains data of lot of employee here is the example one of employee id shown below.

Empid   Name    Location    JobTitle    Email-id     Department
1677    Umesh     Gadag      ASE      abc@gmail.com    Civil

Here is the Working code

import pandas as pd
df1 = pd.read_excel (r'C:\\Users\\Kiran\\Desktop\\Employee id.xlsx',header=None)# excel sheet containing only ids
df2= pd.read_excel (r'C:\\Users\\Kiran\\Desktop\\Employee details.xlsx) # excel sheet containing all details of 10000+ employees
df3 = df2[df2['Empid'].isin(df1[0])]
df3.to_excel("Output1.xlsx",index=False)#Final output

The code works fine but i get output as Randomly

Empid   Name    Location    JobTitle    Email-id       Department
1677    Umesh     Gadag      ASE      abc@gmail.com      Civil
5623    Kiran     Hubli      SE       123@gmail.com      Civil
5618    Rudra     Bidar      ASE      xyz@gmail.com     Electrical
5597    Suresh    Udupi      ASE       ppp@gmail.com    Mechanical 

But i need output in this sequence as follows, since the Employee id are in a particular order

Empid   Name    Location    JobTitle    Email-id      Department
1677    Umesh     Gadag      ASE      abc@gmail.com     Civil
5597    Suresh    Udupi      ASE      ppp@gmail.com     Mechanical 
5623    Kiran     Hubli      SE       123@gmail.com     Civil
5618    Rudra     Bidar      ASE      xyz@gmail.com     Electrical
Kiran
  • 55
  • 1
  • 7
  • 2
    Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – RichieV Aug 02 '20 at 15:06
  • you need to store two different variables, say `df_ids` and `df_details` – RichieV Aug 02 '20 at 15:06
  • Could you please help me with that, i am new to python ,Also I need the output in a separate excel sheet – Kiran Aug 02 '20 at 15:20

3 Answers3

1

Suppose df_small is the dataframe having the entries of employee ids of whose data needs to be fetched from df_big which has employee data of 10000+ employees.

So the details can be fetched as below:

df_emp_details = df_big[df_big['Empid'].isin(df_small['Employee id'])]

EDIT: To read excel without header/column names, use:

# This will create a default column 0 in the dataframe.
df_small = pd.read_excel('path/to/excel.xlsx', header=None)

# Use below code to fetch the details.
df_emp_details = df_big[df_big['Empid'].isin(df_small[0])]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html


EDIT2: I believe you want fetched rows to be in the order of employee ids. For that use sort_values

# ...
# Sorts based on column `Empid`.
df_emp_details = df_emp_details.sort_values(by='Empid')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

Puneet Singh
  • 344
  • 3
  • 12
  • The Code Works, but only issue is the User details are not in same order as the Employee id sheet. – Kiran Aug 02 '20 at 16:48
  • I have modified the entire code, Please have a look at it. I need the output in a manner as same as the sequence in Employer id excel sheet, Could you please help me with that – Kiran Aug 02 '20 at 17:20
0
from pandas import read_excel

excel_data_df = read_excel('data.xlsx', sheet_name='Sheet1')
excel_data_df.columns = ["Empid", "Name", "Location", "JobTitle", "Email-i", "Department"]


emp_id = int(input("Enter Employee id: "))
for columns in excel_data_df[excel_data_df.Empid == emp_id].values:
    for item in columns:
        print(item)
AM Z
  • 423
  • 2
  • 9
  • Thanks for your code, but when i run it it gives a error "ValueError: Length mismatch: Expected axis has 1 elements, new values have 6 elements" . I need output in seprate excel sheet – Kiran Aug 02 '20 at 15:32
0

You want a left join

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

As it joins on the index, you need to make sure the Empid column is set to the index

df_small = df_small.join(df_big.set_index('Empid'), on = 'Employee ID', how = 'left')

Hopefully that method is improved in the future so it makes it easier for setting which columns to join or or not getting into complex multi-indexing to join on multiple columns.