0

I would like to run the equivalent of PostgreSQL's

SELECT * FROM GENERATE_SERIES(1, 10000000)

I've read this:

http://blog.jooq.org/2013/11/19/how-to-create-a-range-from-1-to-10-in-sql/

But most suggestions there don't really take an arbitrary length - the query depends on the length otherwise than by just replacing a number. Also, some suggestions do not apply in MonetDB. So, what's my best course of action (if any)?

Notes: - I'm using a version from February 2013. Answers about more recent features are also welcome, but are exactly what I'm looking for. - Assume the existing tables don't have enough lines; and do not assume that, say, a Cartesian product of the longest table with itself is sufficient (or alternatively, maybe that's too costly to perform).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Not familiar with MonetDB, but it looks like someone committed a [`generate_series` function](http://dev.monetdb.org/hg/MonetDB/rev/14b60d882564) to the source code two weeks ago. Can you build from source and use that? – Iain Samuel McLean Elder Jun 01 '14 at 08:47
  • @IainElder: Edited to address this. – einpoklum Jun 01 '14 at 08:57
  • What do you intend to do with the values? If you need 10 billion sequential values to use as a primary key, there are better ways to generate the values. Give a little more context so we can help you better. – Iain Samuel McLean Elder Jun 01 '14 at 12:48

2 Answers2

1

Try with:

SELECT value
FROM sys.generate_series(initial_value, end_value, offset);

I have to report that the function is quite unstable on Jul2015 release as is causing the server process to crash. Hope you have better luck.

If you wants to generate an arbitrary numeric value you can use:

SELECT rand();
0

Forgive me; I've never worked with MonetDB before. But the documentation leads me to believe you can solve this with the ROW_NUMBER function and a pre-populated table like SYS.COLUMNS.

SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS;

This falls into jooq.org's category of just taking random records from a “large enough” table.

PostgreSQL's generate_series function is elegant, but non-standard. It's absent in other mainstream engines like SQL Server, Oracle, and MySQL. Your version of MonetDB doesn't have it either.

MonetDB does have the ROW_NUMBER function, a close equivalent in standard SQL. It assigns a sequential integer to rows in a result set. It will output the correct values, but it needs some rows in your database already. A chicken and egg problem!

SYS.COLUMNS is a system metadata table that contains one row for every column in your database. Most "empty" relational databases still have hundreds of system columns that appear in tables like these.

If the first query produces more rows than you need, you can push it into a subquery and filter the intermediate result.

SELECT rownum
FROM (
  SELECT ROW_NUMBER() OVER () AS rownum
  FROM SYS.COLUMNS
) AS tally
WHERE rownum >= 1 AND rownum <= 10;

But what if you need to generate more rows than you have in SYS.COLUMNS? Unfortunately, the shape of the query does depend on how many rows you want to generate.

A common workaround in the Microsoft SQL Server community would be to join SYS.COLUMNS to itself. This will produce an intermediate table containing the square of the number of rows in the table. In practice, it's probably more rows than you'll ever need.

With a self-join, the solution looks like this:

SELECT rownum
FROM (
  SELECT ROW_NUMBER() OVER () AS rownum
  FROM SYS.COLUMNS AS a, SYS.COLUMNS AS b
) AS tally
WHERE rownum >= 1 AND rownum <= 100000;

Hopefully these queries are also relevant in MonetDB world!

Community
  • 1
  • 1
Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
  • Yes, they are indeed... however, I want the _same_ query to work regardless of whether I need to generate 10 records or 10,000,000,000. – einpoklum Jun 01 '14 at 11:47
  • To use just one query, know in advance the most rows you will ever need, and row count in the source table. Write a query with the minimum number of self-joins to generate at least your maximum number. If you never need more than 10 billion rows, and the source table contains 1000 rows, you need a query that refers to the source four times. In numbers, `1000 ^ 3 < 10 billion < 1000 ^ 4`. If you decide you want yet more, add another join. Eventually you'll run out of time, space, money, or patience to process it, by which point it's probably big enough :-) – Iain Samuel McLean Elder Jun 01 '14 at 12:45