0

I can create column of type "UTF8" as follows

    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, false),
        Field::new("payload", DataType::Utf8, false),
    ]));

    let vec_of_strings: Vec<String> = vec!["one".to_string(), "two".to_string()];
    
    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(Int32Array::from_slice([1, 2])),
            Arc::new(StringArray::from(vec_of_strings)),
        ],
    )?;

    ctx.register_batch("demo", batch)?;

Executing a query against this, like so

    let df = ctx.sql(r#"
       SELECT *
       from demo
    "#).await?;

gives the expected results

+----+---------+
| id | payload |
+----+---------+
| 1  | one       |
| 2  | two      |
+----+---------+

Now I have a usecase where the payload should be an array. So something like this

+----+---------+
| id | payload |
+----+---------+
| 1  | [piano, guitar, drums]   |
| 2  | [violin, piano]      |
+----+---------+

How may I go about this?

changing the vec_of_strings to vec_of_vecs fails. I mean this

    let vec_of_vecs: Vec<Vec<String>> = vec![
        vec!["piano".to_string(), "guitar".to_string(), "drums".to_string()],
        vec!["violin".to_string(), "guitar".to_string()]
    ];

When used to create the batch like this

    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(Int32Array::from_slice([1, 2])),
            Arc::new(StringArray::from(vec_of_vecs)),
        ],
    )?;

Fails to compile with the error

   |
80 |             Arc::new(StringArray::from(vec_of_vecs)),
  |                      ----------------- ^^^^^^^^^^^ the trait `From<Vec<Vec<std::string::String>>>` is not implemented for `GenericByteArray<GenericStringType<i32>>`
  |                      |
  |                      required by a bound introduced by this call
  |
  = help: the following other types implement trait `From<T>`:
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<GenericByteArray<GenericStringType<OffsetSize>>>>
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<Vec<&[u8]>>>
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<Vec<Option<&[u8]>>>>
            <GenericByteArray<GenericBinaryType<T>> as From<GenericListArray<T>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<GenericByteArray<GenericBinaryType<OffsetSize>>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<GenericListArray<OffsetSize>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<Vec<&str>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<Vec<Option<&str>>>>
          and 3 others

Any idea on how I may achieve the above?

Finlay Weber
  • 2,989
  • 3
  • 17
  • 37
  • Hmm not sure how you'd go about implementing that, but any relational (-adjacent) table that contains a list in one of it's columns always will sound like headaches waiting to happen to me, imo such columns should be avoided like the pest and it wouldn't suprise me that such a construct just isn't implemented. – cafce25 May 20 '23 at 14:32

0 Answers0