0

I would like to merge 2 dataframes and I have tried with the code below but it's not working,

merg <- merge(companies, rounds2, 
by.companies = "permalink", 
by.rounds2 = "company_permalink", all = TRUE)

One data frame has more than 1,00,000 rows and 8 columns and other dataframe has 60,000 + rows, 6 columns. Permalink is the unique key both dataframes but different column names. I m not sure how the file will look if merge 2 dataframes which have more and fewer rows. We need to merge as column wise.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Shree
  • 31
  • 2
  • 7

1 Answers1

0

In the by.x="" and in by.y="" you have to put the name of the permalink identifier. I do not know what this is since I do not have a data example. Regarding the join there are several options there as well for instance all.x=TRUE all.y=TRUE or all=TRUE or FALSE. These depend on how you want to join the dataframe.

    companies=data.frame(companies=rnorm(100),other1=rnorm(100))
    rounds2=data.frame(rounds2=rnorm(100),other1=rnorm(100))
    companies
    rounds2
    merge(companies,rounds2,by.x="companies",by.y="rounds2",all=TRUE)
  • merge(companies,rounds2,by.permalink="companies",by.company_permalink="rounds2",all=TRUE) Error: cannot allocate vector of size 28.4 Gb I m getting the error above when i tried – Shree May 02 '18 at 09:58
  • Memory issues is a different thing. https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb – Dimitrios Zacharatos May 02 '18 at 10:01
  • Can you tell me how big is your dataset? how many rows each collumn has? 100,000 and 60,000? It should be able to handle it. – Dimitrios Zacharatos May 02 '18 at 10:01
  • by.company_permalink by.permalink looks strange argument to me – Dimitrios Zacharatos May 02 '18 at 10:03
  • I have 2 dataframes... 1 dataframe is related to companies details and it has 67,000 rows and the other data frame is related to funds details and contains 1,00,000 rows. For both dataframes the common column is company name. I want to merge companies DF to funds DF i.e. rounds 2 using company name. – Shree May 02 '18 at 10:12
  • companies Df column name: permalink rounds2 Df column name: company_permalink – Shree May 02 '18 at 10:13
  • merge(companies,rounds2,by.x="company_permalink",by.y="permalink rounds2",all=TRUE) – Dimitrios Zacharatos May 02 '18 at 10:38
  • merge(companies,rounds2,by.x="permalink rounds2",by.y="company_permalink",all=TRUE) – Dimitrios Zacharatos May 02 '18 at 10:39
  • me<-merge(companies,rounds2,by.x="permalink rounds2",by.y="company_permalink",all=TRUE) Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column – Shree May 02 '18 at 10:46
  • Details: Company data frame: permalink ----- column name /Organization/-Fame /Organization/-Qounter Rounds2 data frame: company_permalink /organization/-fame /ORGANIZATION/-QOUNTER /organization/-qounter I want to merge both datframes using this permalink columns. The problem is case sensitive so not sure how will it merge. Do we need to change them to lower case before merging? – Shree May 02 '18 at 10:53
  • permalink rounds2 the space in the variable could be causing the problem – Dimitrios Zacharatos May 02 '18 at 10:54
  • try names(dataframe)<-make.names(names(data.frame)) – Dimitrios Zacharatos May 02 '18 at 10:54
  • I converted both column names to lower case and tried with master_frame <- merge(rounds2, companies, by.x="company_permalink", by.y="permalink", all.x= TRUE) and its working – Shree May 02 '18 at 12:58
  • Thank you for your help! another question i have is Are there any companies in the rounds2 file which are not  present in companies ? how can i find this? – Shree May 02 '18 at 12:59
  • I would check for cross tabulation try using the tables function – Dimitrios Zacharatos May 02 '18 at 13:05