Questions tagged [semi-join]

37 questions
59
votes
3 answers

What is semi-join in database?

I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?
Henu
  • 1,622
  • 2
  • 22
  • 27
44
votes
8 answers

Perform a semi-join with data.table

How do I perform a semi-join with data.table? A semi-join is like an inner join except that it only returns the columns of X (not also those of Y), and does not repeat the rows of X to match the rows of Y. For example, the following code performs…
hadley
  • 102,019
  • 32
  • 183
  • 245
13
votes
2 answers

dplyr semi_join Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

I am using dplyr 1.0.6 and R 4.1.0 and I wrote 2 functions as follows: AllCustomersList <- loadAllCustomersData() CouldJoinByNationalID <- matchCustomersByNationalCode(AllCustomersList = AllCustomersList) loadAllCustomersData() returns a list of…
Ali Sadeghi Aghili
  • 524
  • 1
  • 3
  • 15
12
votes
4 answers

Hive LEFT SEMI JOIN for 'NOT EXISTS'

I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a. Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS…
mel
  • 1,566
  • 5
  • 17
  • 29
5
votes
2 answers

Broadcast left table in a join

This is my join: df = df_small.join(df_big, 'id', 'leftanti') It seems I can only broadcast the right dataframe. But in order for my logic to work (leftanti join), I must have my df_small on the left side. How do I broadcast a dataframe which is on…
ZygD
  • 22,092
  • 39
  • 79
  • 102
4
votes
2 answers

What is the right way to do a semi-join on two Spark RDDs (in PySpark)?

In my PySpark application, I have two RDD's: items - This contains item ID and item name for all valid items. Approx 100000 items. attributeTable - This contains the fields user ID, item ID and an attribute value of this combination in that order.…
soorajmr
  • 520
  • 4
  • 9
3
votes
3 answers

Join data but ignore missing values

I am having some trouble with joining data frames with dplyr, where I would like to ignore the NAs. The data that I have is quite big, but a simplified version looks like: id <- c("id1", "id2", "id3", "id4") A <- c("E", "F", "G", NA) B <- c("T", NA,…
PieterD
  • 59
  • 5
3
votes
1 answer

Performing a semi-anti-join (in binary search)

I'd like to subset a data.table by choosing the first key and excluding the second key. set.seed(18032) DT <- data.table(grp1 = sample(10, 1000, T), grp2 = sample(10, 1000, T), v = rnorm(100), key = "grp1,grp2") My…
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
2
votes
2 answers

Convert SQL Query to Relational Algebra

I need some help converting an SQL query into relational algebra. Here is the SQL query: SELECT * FROM Customer, Appointment WHERE Appointment.CustomerCode = Customer.CustomerCode AND Appointment.ServerCode IN ( SELECT ServerCode…
user1022788
  • 419
  • 8
  • 18
2
votes
4 answers

SQL Query Join 2 tables

I'm new to SQL and was reading on joins but i'm a bit confused so wanted help.... I have a table called student_sport which stores StudentID and SportID I have another table which stores details of matches... so basically sportsID,MatchId,.... What…
John
  • 4,413
  • 6
  • 25
  • 28
2
votes
5 answers

Equality of "select ... where in" and joins

Suppose I have a table1 like this: id | itemcode ------------- 1 | c1 2 | c2 ... And a table2 like this: item | name ----------- c1 | acme c2 | foo ... Would the following two queries return the same result set under every condition? SELECT…
G_H
  • 11,739
  • 3
  • 38
  • 82
2
votes
4 answers

What kind of join do I need?

I have a votes table: votes ----------------- userid gameid ------- -------- a 1 a 2 a 3 b 1 b 2 and a games table: games ---------------- gameid title ------ ------ 1 foo …
xdumaine
  • 10,096
  • 6
  • 62
  • 103
2
votes
1 answer

Sql to fetch records only if related other table records exist

Table: user id compId 1 comp1 2 comp1 Table: Company id name comp1 coke comp2 pepsi need a MYSQL query which should fetch company record only if it has one or more users, when passed a company id. I will have other where…
dreambigcoder
  • 1,859
  • 4
  • 22
  • 32
2
votes
2 answers

AnalysisException: Using PythonUDF in join condition of join type LeftSemi is not supported

I am not doing LeftSemi join anywhere, neither am I using a python UDF. Still I am getting this error when joining two dataframes. df1 - one column, is primary key of the table, say "customerHash". It may be empty(In fact in my current case, it is…
Anmol Deep
  • 463
  • 1
  • 5
  • 16
2
votes
2 answers

Semi_join to filter columns of X based on multiple Y columns

Starting with these two data frames: data <- data.frame("Run_ID" = c(1,2,3), "Sample" = c("A", "B", "C"), "Value" = c(1,2,3)) metadata <- data.frame("Run_ID" = c(1,3), "Sample" = c("A","C")) I would like to subset data so that it only includes the…
bgfritz1
  • 25
  • 5
1
2 3