0

I was trying to generate random string in BigQuery when I encountered this answer.

SELECT
    word
FROM
    `publicdata.samples.shakespeare`
WHERE
    RAND() < 10/(
    SELECT
        COUNT(*)
    FROM
        `publicdata.samples.shakespeare`)

It worked but I need to create a function out of that answer. This is my attempt to convert it but no luck.

CREATE TEMP FUNCTION
    random_word() AS ( (
        SELECT
            STRING_AGG(word, "_") AS aggd_word
        FROM (
            SELECT
                LOWER(REPLACE(word, "'", "")) AS word
            FROM
                `publicdata.samples.shakespeare`
            WHERE
                RAND() < 10/(
                SELECT
                    COUNT(*)
                FROM
                    `publicdata.samples.shakespeare`)
            LIMIT
                3)) );
SELECT
    random_word();

I got this error

Table not found: `publicdata.samples.shakespeare`;
failed to parse CREATE [TEMP] FUNCTION statement at [25:9]
shankshera
  • 947
  • 3
  • 20
  • 45
  • Which characters can be in the string? Any UTF-8 code points, or are you specifically targeting `a-zA-Z0-9`, for example? What are the upper and lower bounds on the length of the string? – Elliott Brossard Feb 28 '19 at 07:38
  • @ElliottBrossard `a-zA-Z0-9` is enough. Length of the string is not my concern. But not too long, maybe 64 chars max. Basically just a random string (not UUID) generator. – shankshera Feb 28 '19 at 07:40
  • 1
    https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions#limits - You cannot reference a table in a UDF – Bobbylank Feb 28 '19 at 08:25
  • @Bobbylank thank you, need to read the docs more thoroughly – shankshera Feb 28 '19 at 09:39
  • No problems, I hope you find another way around the problem – Bobbylank Feb 28 '19 at 10:15

1 Answers1

2

One approach is to convert a hash into characters in the desired range:

CREATE TEMP FUNCTION MapChar(c INT64) AS (
  CASE
    WHEN c BETWEEN 0 AND 9 THEN 48 + c -- 0 to 9
    WHEN c BETWEEN 10 AND 35 THEN 55 + c -- A to Z
    ELSE 61 + c -- a to z
  END
);

CREATE TEMP FUNCTION RandString() AS ((
  SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(MapChar(MOD(c, 62))))
  FROM UNNEST(TO_CODE_POINTS(SHA256(CAST(RAND() AS STRING)))) AS c
));

SELECT RandString();

You can useSHA512 instead of SHA256 if you want a longer string.

Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99