4

I am using PostgreSQL as a SQL-Server to save datasets used to train models in Weka (machine learning tool).

Weka then reads the table by creating a feature out of each column. For this project the data consists of 24x35 px images where each pixel is a feature. Therefore I have to create a table with 841 columns (840 pixel values, 1 id (primary key)).

The images are grayscale images. So each pixel value ranges from 0 to 255. Therefore I want to either save it as one integer per pixel / column or one byte per pixel / column. The "id"-column however has to be an integer.

What is the best / easiest way to set up a table of that size?

  • That would depend on the type you use, the db has an 8kb page size so a row is ~limited to the sum of the sizes of the types of the columns. – Alex K. Oct 18 '14 at 15:55
  • @AlexK. The pixel values will be saved as integers (~3.5kB per entry). However a single byte per column should be sufficient as the values range from 0-255. Will edit the question to make it more clear. –  Oct 18 '14 at 16:01
  • Do you query or sort by specific pixel? If not, if you only ever care about the image as a whole, all the pixels together, then there is no need to break them out by pixels. The [`bytea`](http://www.postgresql.org/docs/current/static/datatype-binary.html) binary data type in Postgres might then be appropriate. – Basil Bourque Oct 19 '14 at 19:38
  • The problem is that Weka seems to be a bit sparse when it comes to database querying. It needs to have each 'feature' in a seperate column of the result in order to detect and process them correctly. As a feature in my case is a single pixel, I will have to split them up. –  Oct 19 '14 at 19:54

2 Answers2

1

Listing below 1 referenced answer about maximum columns, and another option to switch to 1 to many.

Maximum number of columns and types

The answer below provides a comprehensive information as to what you should check.

What is the maximum number of columns in a PostgreSQL select query

Changing to use 1 to Many

The downside to this is you will have to recreate DB schemas every time the pixel number (image size) changes.

Instead, you could create a 1 to many relationship and have a table with:

image_id, pixel_number, value

So, for one image with N pixels you would have:

1, 1, value
1, 2, value
....
1, N, value
Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • I like the idea of using a 1 to Many relation. However the table will have at least 25 000 entries (images). Using what you suggested would give the table ~21 million rows. Will postgresql have problems handling them? Further the 25k images are only a subset of the dataset, the positive samples. There entire dataset consists of ~100k images. –  Oct 18 '14 at 16:32
  • It shouldn't have a problem if you put indexes on `image_id` so that results are returned quickly. – Menelaos Oct 18 '14 at 19:55
1

Are all pictures different each other? I mean, if much of them are identical, may be you should create a second table to store all the pics individually, and then you can opt for create a 1 to many or 1 to 1 table, like this:

create table picuture
(
  pic_id integer,
  pic_value <whatevertypeyouwant>,

  pk_picture primary key pic_id
);

-- option 1
create table your_table
(
  id integer,
  pic_id_001 integer,
  pic_id_002 integer,
  ...
  pic_id_840 integer,

  pk_your_table primary key id
);

-- option 2
create table your_table
(
  id integer,
  pic_id integer,

  pk_your_table primary key id, pic_id
);
Christian
  • 7,062
  • 9
  • 53
  • 79
  • Yes and no. 25k images are different representations of the same object. 75k images are 'background' images used to train the classifiers. So yes, some of the images have something in common. However there are no dublicate images. –  Oct 19 '14 at 20:23
  • OK... even though there are no duplicates, I choose the second option... what if in the near future, you have to extend from 840 to 1600 pictures? – Christian Oct 20 '14 at 16:13
  • The number 840 is fixed and nothing will change it unless I am unable to pruduce meaningfull predictions. Additionally 840 is the number of pixels per image. The number of pictures currently is ~100 000. –  Oct 20 '14 at 16:58