1

I am developing a mass emailing app, that could theoretically send many campaigns a week, and each one sending to thousands of contacts.

I have devised two possible ways to store these campaigns:

  • Have a app_campaigns table and a app_campaign_emails table, creating one row for each email sent while referencing the parent campaign and the actual contact. The app_contacts table would store the actual email address and other data.
  • Or, to store all the contact IDs in the app_campaigns table, in a comma-separated field. That way, there is no need to create multiple records for each campaign.

What would be the best approach? Here, performance and keeping the database small is a priority since a few campaigns can quickly increase the size of the database (there are multiple clients sending multiple campaigns a week, each one with a potentially large list of contacts).

But it would also need to keep track of opens and clicks, which I'm thinking of doing by only recording entries as they happen.

Some apps such as Sendy go for the second approach, and avoid having to create thousands of rows in one go. However, it might get a little messy since doing searches and queries with a comma-separated Id field is harder than just querying the database.

To further complicate things, there should be a way to display each email sent to a contact in a CRM-esque interface. So by using the second approach, it would mean doing FULLTEXT searches in the app_campaigns table to build a proper list.

Any thoughts?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I would go for the first (normalised) approach creating the proper structure for your many to many relationship. The table may be large in terms of rows but not data, it would simply hold the id of the campaign and the id of the email address. This will make your life alot easier when pulling emails sent on an address by adress basis and improve performance when working from an email adress to get the campaigns theyve been sent etc – JParkinson1991 Oct 01 '17 at 12:14
  • That's what I though, but as mentioned in [my comment](https://stackoverflow.com/questions/46512230/mysql-external-table-vs-one-textarea-column-with-comma-separated-ids?noredirect=1#comment80007019_46512273), the table would eventually grow to hundreds of millions of rows. Would performance take a hit if the table only holds the ID of the contact and the campaign? – Manuel Vicedo Oct 02 '17 at 13:36

1 Answers1

0

The correct SQL way to store the data is to have a separate row in each table for each contact.

Why? Here are reasons why storing numeric ids in a comma-delimited list is not a good idea:

  • SQL has a great data structure for storing lists. It is called a table.
  • Data values should be stored as the appropriate type. Numbers should be stored as numbers.
  • SQL has relatively lousy string processing capabilities.
  • Ids should be declared with foreign key definitions. You cannot do that with a string.
  • Operations on the strings may take longer.

Even the notion that the junction table will be much larger is suspect. Consider the number one million -- 1,000,000. As a string, this occupies 7 bytes, but the comma separator makes it 8.

If you had a table with two keys, and each was an integer, then the row in the reference table would occupy -- guess what, 8 bytes.

There are some situations where storing comma-delimited strings might be warranted. The only one that I can think of is when the string is not being used by the database at all and just being passed back to an application for some reason. In other words, the string is essentially a BLOB>

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • That's right-- tables are inherently better for this. However, the size might actually pose an issue: sending 3 campaigns to 2000 contacts per week, AND multiplying this by 100 or so senders (the app is a multitenant SaaS), means the table will grow at roughly ~2,400,000 rows per month. After a few years, this becomes a problem. In contrast, storing sent IDs sacrifices flexibility for greatly reducing index size. The question is whether the tradeoff would be worth it. – Manuel Vicedo Oct 02 '17 at 13:33
  • @ManuelVicedo . . . I think you need to learn about partitioning. – Gordon Linoff Oct 02 '17 at 13:51
  • We haven't got to the point where we have thought about scaling techniques-- that being said, I'll stick to good DB design and worry about scaling issues later, given that an archive table could alleviate the problem. Thanks for the answer. – Manuel Vicedo Oct 02 '17 at 14:03
  • @ManuelVicedo . . . Storing integers in delimited is strings is not, not, not, not, not, not good database design. Period. There may be times where you have to do it. Your scaling example is not one of them. – Gordon Linoff Oct 03 '17 at 01:51