Say, I am building a camera app. Every time the user clicks a photo, the image is stored on the cloud. Because I want to restrict how many images are stored on the cloud, the app gets 10 URLs in an array called listURLs
when it is initialised.
The first 10 clicks get PUT
into the cloud, exhausting listURLs
. Then, every time a click happens, a coin toss determines whether the latest click replaces an existing click on the cloud. Typical numbers would be 50 clicks, first 10 clicks get assigned a URL, and of the remaining 40 clicks, 20 of them overwrite an existing URL.
I store records of each app session in a Postgres DB. Each session will have an ID and all instances of clicks (which may or may not have a corresponding url). I also need to know the url
corresponding to each click, if one exists. So, if there are 30 clicks, I will need to know which 10 of these have a corresponding url
.
I can think of two ways of storing this data.
tblClicksURLs
as a Table that hasclick_id
,url
andurl_active
as its fields. Every time aclick_id
and non-nullurl
need to be inserted, update all other records with the sameurl
to haveurl_active
asfalse
.Two tables
tblClicks
andtblURLs
.tblURLs
has aclick_id
foreign key. Every time aclick_id
and non-nullurl
need to be inserted, theclick_id
gets inserted intotblClicks
andclick_id
andurl
get upserted intotblURLs
. The upsert is based on whether theurl
already exists intblURLs
. So, for a givenurl
, there will only be oneclick_id
intblURLs
So, in Case 1, I will have an UPDATE
of url_active
followed by INSERT
on the same table. In Case 2, I will have an INSERT
into one table and an UPSERT
into another. I will need indexing on click_id
, but not on url
.
If you are looking at writes of > 10k rows per second, maybe even more, which of these two would be more efficient? Assume that the numbers per session are similar to the one quoted above (50 clicks, etc.)
I could also register a created_at
datetime for each record in Case 1, and just use the first non-null url
ordered reverse-chronologically. But, I am trying to avoid this, unless the performance benefits are enormous.