0

I have 106 columns in 1st DF and 97 in 2nd and i want to merge both of them. For this i need to have identical columns in both DF's.

So how can i achieve below requirements(listed below).

DF1 :column names are A,B,C & D 
DF2 :column names A,B & E.

Can select below combinations of columns in dataframes ?

1) Match in both i.e A & B 
2) Extras in 2nd i.e E
3) Extras in first i.e C & D

I tried different ways like select() in dplyr with colnames(df1) == colnames(df2) etc and other different possibilities but not getting any success.

Below is Dataframe1 :

[1] "ï..Lan.ID"                 "NBFC"                      "Application.ID"           
  [4] "Region"                    "Loan.City"                 "Loan.Type"                
  [7] "Loan.Scheme"               "Name"                      "Mobile.Number"            
 [10] "Loan.Status"               "Principal.Outstanding"     "Last.EMI"                 
 [13] "Next.EMI"                  "Next.Bullet.Month"         "Next.Bullet.Amount"       
 [16] "Sum.Instalment.Posted"     "Dues.Receipts"             "EMI.Due"                  
 [19] "All.Dues"                  "Instalment.Dues"           "Bullets.Overdue"          
 [22] "Loan.Quality"              "Sanctioned.Amount"         "Loan.Amount"              
 [25] "Tenure"                    "Completed.Tenure"          "Tenure.Left"              
 [28] "Personal.Email"            "Official.Email"            "No..Of.Late.Payments"     
 [31] "CRIF.Score"                "CIBIL.Score"               "No.of.Actions"            
 [34] "Fixed.Income"              "ECS.Customer.Name"         "ECS.Bank.Name"            
 [37] "ECS.Account.Number"        "Loan.Date"                 "Sanction.Month"           
 [40] "EMI.Start.Date"            "X1st.EMI.Month"            "End.Date"                 
 [43] "Home.Address"              "Permanent.Address"         "Employer.Name"            
 [46] "Company.MCA.ID"            "Business.Address"          "Reference.Details"        
 [49] "Nature.of.Business"        "Pan.Card"                  "Aadhar.UID"               
 [52] "Gender"                    "Educational.Qualification" "DOB"                      
 [55] "Marital.Status"            "Last.Payment.Date"         "Job.Type"                 
 [58] "Employment.Year"           "Cycle.Date"                "Age"                      
 [61] "relevant_pos"              "crif_active_accounts"      "crif_overdue_amt"         
 [64] "crif_current_outstanding"  "cibil_active_accounts"     "cibil_overdue_amt"        
 [67] "cibil_current_outstanding" "NACH.Status"               "Awarenss.Allocation"      
 [70] "Allocation.Date"           "Awareness.Data"            "Awareness.Brk.up"         
 [73] "Dec.19.EMI.Amount"         "Tenure.End"                "Dec.19.BKt"               
 [76] "DPD"                       "New.DPD"                   "DPD.Range.New"            
 [79] "New.Amount.Due"            "New.Total.Due"             "Loan.Slabs"               
 [82] "Last.Month.Bnc"            "X1st.EMI"                  "Dec.19.Bnc"               
 [85] "Dec.19.Non.Starter"        "Reason.of.Bnc"             "HNI"                      
 [88] "EMI.Due.1"                 "OS"                        "Advance.Paid"             
 [91] "Paid.Unpaid"               "Not.Allocated"             "Excess"                   
 [94] "CC.Take.Over...OD"         "Last.Month.delinq"         "Loan.Status.1"            
 [97] "CIBIL.Bracket"             "Salary.Bracket"            "DPD.1"                    
[100] "Reason.of.Default"         "Contactibility"            "Delinq"                   
[103] "PayTm.Industry"            "Industry"                  "Employer.Name.1"          
[106] "DELINQ.NON.DELINQ"

Dataframe 2:

[1] "ï..Lan.ID"                 "NBFC"                      "Application.ID"           
 [4] "Region"                    "Loan.City"                 "Loan.Type"                
 [7] "Loan.Scheme"               "Name"                      "Mobile.Number"            
[10] "Loan.Status"               "Principal.Outstanding"     "Last.EMI"                 
[13] "Next.EMI"                  "Next.Bullet.Month"         "Next.Bullet.Amount"       
[16] "Sum.Instalment.Posted"     "Dues.Receipts"             "EMI.Due"                  
[19] "All.Dues"                  "Instalment.Dues"           "Bullets.Overdue"          
[22] "Loan.Quality"              "Sanctioned.Amount"         "Loan.Amount"              
[25] "Tenure"                    "Completed.Tenure"          "Tenure.Left"              
[28] "Personal.Email"            "Official.Email"            "No..Of.Late.Payments"     
[31] "CRIF.Score"                "CIBIL.Score"               "No.of.Actions"            
[34] "Fixed.Income"              "ECS.Customer.Name"         "ECS.Bank.Name"            
[37] "ECS.Account.Number"        "Loan.Date"                 "Sanction.Month"           
[40] "EMI.Start.Date"            "X1st.EMI.Month"            "End.Date"                 
[43] "Home.Details"              "Permanent.Address.Details" "Employer.Name"            
[46] "Company.MCA.ID"            "Business.Details"          "Reference.Details"        
[49] "Nature.of.Business"        "Pan.Card"                  "Aadhar.UID"               
[52] "Gender"                    "Educational.Qualification" "DOB"                      
[55] "Marital.Status"            "Last.Payment.Date"         "Job.Type"                 
[58] "Employment.Year"           "Cycle.Date"                "Age"                      
[61] "relevant_pos"              "crif_active_accounts"      "crif_overdue_amt"         
[64] "crif_current_outstanding"  "cibil_active_accounts"     "cibil_overdue_amt"        
[67] "cibil_current_outstanding" "NACH.status"               "Awarenss.Allocation"      
[70] "Allocation.Date"           "Awareness.Data"            "Awareness.Brk.up"         
[73] "June.19.EMI.Amount"        "Tenure.End"                "June.BKt"                 
[76] "Loan.Slabs"                "Last.Month.Bnc"            "X1st.EMI"                 
[79] "June.19.Bnc"               "June.19.Non.Starter"       "Reason.of.Bnc"            
[82] "HNI"                       "EMI.Due.1"                 "OS"                       
[85] "Advance.Paid"              "PAID.Unpaid"               "Not.Allocated"            
[88] "Excess"                    "DPD"                       "CC.Take.Over"             
[91] "Last.Month.delinq"         "Loan.Status.1"             "CIBIL.Bracket"            
[94] "Salary.Bracket"            "DPD.1"                     "DELINQ.NON.DELINQ"        
[97] "Month"

Expected outcome here would be names of matching columns & names of unmatched columns in both DF's.

SKB
  • 189
  • 1
  • 13
  • Can you show a reproducible example and expected output please? – Sotos Dec 11 '19 at 12:55
  • Edited original post and added structure of data frames and expectations. – SKB Dec 11 '19 at 13:04
  • Read about - [Set Operations](https://stat.ethz.ch/R-manual/R-devel/library/base/html/sets.html) – zx8754 Dec 11 '19 at 13:04
  • Maybe relevant, possible duplicate post https://stackoverflow.com/q/3402371/680068 – zx8754 Dec 11 '19 at 13:07
  • 2
    Here is a **REPRODUCIBLE** example: `df1 <- data.frame(V1 = 1:5, V2 = 6:10)` and `df2 <- data.frame(V1 = 3:7, V3 = 9:13)`. What do you expect to get from these two dataframes? – Sotos Dec 11 '19 at 13:09
  • zx8754 thanks for suggestion, but i am more interested in just the names matching/different, so that i can better control the data. – SKB Dec 11 '19 at 13:10
  • Hi Sotos, the expectation is to get 1) V1 as an output, 2) V1,V2 3) V1,V3 as an output (just he names of columns. – SKB Dec 11 '19 at 13:12
  • 2
    So this? `list(intersect(names(df1), names(df2)), names(df1), names(df2))`? – Sotos Dec 11 '19 at 13:21

1 Answers1

1

I think Sotos's comment provide the most elegant output expected to your question.

However as an alternative, you can have the use of %in%:

O1 = colnames(dfA)[colnames(dfA) %in% colnames(dfB)]

> O1
[1] "A" "B" "C"

However, regarding your matching conditions 2) and 3), it's a little bit confusing because when you ask for:

2) Common in both and additional in 2nd i.e A,B & E

To my opinion, it correspond to all columns in the second dataset (colnames(dfB))

3) Common in both and extras in first i.e A,B,C & D

And this correspond to all columns in the first dataset (colnames(dfA))

Does it makes sense to you ? Did I missed something on your merging pattern ?

Data

dfA = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfA) = LETTERS[1:4]

dfB = data.frame(matrix(sample(1:100, 16), ncol = 4, nrow = 4))
colnames(dfB) = LETTERS[c(1:3,5)]

> dfA
   A  B  C  D
1 75 66 17 89
2 46  7 27 38
3 97 26 47 31
4 32 20 71  2

> dfB
   A  B  C  E
1 94 70 18 16
2 69 57 29 60
3 53 50 25 96
4 37 51 64 75
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Thanks dc37, for 2) & 3) I would like to have an identification that which are common and which additional columns are from df1 or df2.. (Though I will be able to sort that out).. Thanks.. – SKB Dec 11 '19 at 16:51