Questions tagged [semi-join]
37 questions
59
votes
3 answers
What is semi-join in database?
I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?

Henu
- 1,622
- 2
- 22
- 27
44
votes
8 answers
Perform a semi-join with data.table
How do I perform a semi-join with data.table? A semi-join is like an inner join except that it only returns the columns of X (not also those of Y), and does not repeat the rows of X to match the rows of Y. For example, the following code performs…

hadley
- 102,019
- 32
- 183
- 245
13
votes
2 answers
dplyr semi_join Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)
I am using dplyr 1.0.6 and R 4.1.0 and I wrote 2 functions as follows:
AllCustomersList <- loadAllCustomersData()
CouldJoinByNationalID <- matchCustomersByNationalCode(AllCustomersList = AllCustomersList)
loadAllCustomersData() returns a list of…

Ali Sadeghi Aghili
- 524
- 1
- 3
- 15
12
votes
4 answers
Hive LEFT SEMI JOIN for 'NOT EXISTS'
I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.
Here is a citation from Hive manual:
"LEFT SEMI JOIN implements the uncorrelated IN/EXISTS…

mel
- 1,566
- 5
- 17
- 29
5
votes
2 answers
Broadcast left table in a join
This is my join:
df = df_small.join(df_big, 'id', 'leftanti')
It seems I can only broadcast the right dataframe. But in order for my logic to work (leftanti join), I must have my df_small on the left side.
How do I broadcast a dataframe which is on…

ZygD
- 22,092
- 39
- 79
- 102
4
votes
2 answers
What is the right way to do a semi-join on two Spark RDDs (in PySpark)?
In my PySpark application, I have two RDD's:
items - This contains item ID and item name for all valid items. Approx 100000 items.
attributeTable - This contains the fields user ID, item ID and an attribute value of this combination in that order.…

soorajmr
- 520
- 4
- 9
3
votes
3 answers
Join data but ignore missing values
I am having some trouble with joining data frames with dplyr, where I would like to ignore the NAs.
The data that I have is quite big, but a simplified version looks like:
id <- c("id1", "id2", "id3", "id4")
A <- c("E", "F", "G", NA)
B <- c("T", NA,…

PieterD
- 59
- 5
3
votes
1 answer
Performing a semi-anti-join (in binary search)
I'd like to subset a data.table by choosing the first key and excluding the second key.
set.seed(18032)
DT <- data.table(grp1 = sample(10, 1000, T),
grp2 = sample(10, 1000, T),
v = rnorm(100), key = "grp1,grp2")
My…

MichaelChirico
- 33,841
- 14
- 113
- 198
2
votes
2 answers
Convert SQL Query to Relational Algebra
I need some help converting an SQL query into relational algebra.
Here is the SQL query:
SELECT * FROM Customer, Appointment
WHERE Appointment.CustomerCode = Customer.CustomerCode
AND Appointment.ServerCode IN
(
SELECT ServerCode…

user1022788
- 419
- 8
- 18
2
votes
4 answers
SQL Query Join 2 tables
I'm new to SQL and was reading on joins but i'm a bit confused so wanted help....
I have a table called student_sport which stores StudentID and SportID
I have another table which stores details of matches... so basically sportsID,MatchId,....
What…

John
- 4,413
- 6
- 25
- 28
2
votes
5 answers
Equality of "select ... where in" and joins
Suppose I have a table1 like this:
id | itemcode
-------------
1 | c1
2 | c2
...
And a table2 like this:
item | name
-----------
c1 | acme
c2 | foo
...
Would the following two queries return the same result set under every condition?
SELECT…

G_H
- 11,739
- 3
- 38
- 82
2
votes
4 answers
What kind of join do I need?
I have a votes table:
votes
-----------------
userid gameid
------- --------
a 1
a 2
a 3
b 1
b 2
and a games table:
games
----------------
gameid title
------ ------
1 foo
…

xdumaine
- 10,096
- 6
- 62
- 103
2
votes
1 answer
Sql to fetch records only if related other table records exist
Table: user
id
compId
1
comp1
2
comp1
Table: Company
id
name
comp1
coke
comp2
pepsi
need a MYSQL query which should fetch company record only if it has one or more users, when passed a company id. I will have other where…

dreambigcoder
- 1,859
- 4
- 22
- 32
2
votes
2 answers
AnalysisException: Using PythonUDF in join condition of join type LeftSemi is not supported
I am not doing LeftSemi join anywhere, neither am I using a python UDF. Still I am getting this error when joining two dataframes.
df1 - one column, is primary key of the table, say "customerHash". It may be empty(In fact in my current case, it is…

Anmol Deep
- 463
- 1
- 5
- 16
2
votes
2 answers
Semi_join to filter columns of X based on multiple Y columns
Starting with these two data frames:
data <- data.frame("Run_ID" = c(1,2,3), "Sample" = c("A", "B", "C"), "Value" = c(1,2,3))
metadata <- data.frame("Run_ID" = c(1,3), "Sample" = c("A","C"))
I would like to subset data so that it only includes the…

bgfritz1
- 25
- 5