4

I am building a mobile service that require a lot of search queries.

The service is based on MySQL data system, and search query is not enough to make a fast search service.

Therefore, I decided to use inverted indexing system:

index    Documents
1        a, b, c, d, e, f, g, h
2        c, k, i, j, k

This is a simple construction for the inverted indexing system.

I assume that there will be more than thousand documents for one row.

I am not sure what kind of type and length that I should use for the 'Documents' column?

I selected VARCHAR(100000) for now. Is it possible to set the length like 9999999?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
james
  • 225
  • 6
  • 20

2 Answers2

10
  1. Data structure:

    index document
      1      a
      1      b
      1      c
     ...
      2      c
      2      k
    

    index type INT, document type CHAR(1). Primary key as set of index and document.

    Queries will perform very fast with this data structure.
    By the way, the structure I propose is normalized.

  2. From MySQL 5.0 Reference Manual (The CHAR and VARCHAR Types):

[...] VARCHAR [...] The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.

Taz
  • 3,718
  • 2
  • 37
  • 59
  • 1
    Awesome, I didn't know that your data structure is much faster than my data structure. It's good to know that 65,535 is maximum for VARCHAR in PHP 5.0.3. – james May 15 '12 at 20:12
  • @james You may find these questions interesting too: http://stackoverflow.com/questions/2320633/first-time-database-design-am-i-overengineering, http://stackoverflow.com/questions/934577/should-i-normalize-my-db-or-not – Taz May 15 '12 at 20:23
  • Does any kind of space issues arise in this method? What are other popular methods? – LearningToCode Jan 24 '16 at 03:36
1

IMHO, this will get ugly, but that being said, your varchar size would cover it, or you could look into TEXT, mediumTEXT, or longTEXT:

TEXT    65,535 bytes    ~64kb
MEDIUMTEXT   16,777,215 bytes   ~16MB
LONGTEXT    4,294,967,295 bytes ~4GB
GDP
  • 8,109
  • 6
  • 45
  • 82