Schemas in Personalize are immutable. Therefore, if you want to add/change/remove a field in an existing schema, you must create a new schema with the desired schema defined using the Apache Avro format.
You can create a schema in the AWS console when you create a dataset or by using the CreateSchema API. Here is an example of creating a schema for an interactions dataset in Python.
import boto3
import json
personalize = boto3.client('personalize')
interactions_schema = {
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "USER_ID",
"type": "string"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
}
],
"version": "1.0"
}
personalize.create_schema(
name = "your-schema-name-here",
schema = json.dumps(interactions_schema)
)
Note that there are required fields/columns for each dataset type. For example, every schema for the interactions dataset type must have USER_ID
, ITEM_ID
, and TIMESTAMP
fields/columns. When naming fields/columns, the convention is to use constant case (i.e. uppercase with "_" to separate words) for field names in the schema and column names in your CSVs. Personalize will automatically map camelcase field names in PutEvents/PutItems/PutUsers API calls to their corresponding constant case in the schema. For example, eventType
is automatically mapped to EVENT_TYPE
. See the docs on datasets and schemas for details. There are also several examples of different schemas in the aws-samples/amazon-personalize-samples GitHub repository and the Personalize blogs.