2

I am learning aws-pesonalize and instead of this schema fields

{
 "type": "record",
 "name": "Interactions",
 "namespace": "com.amazonaws.personalize.schema",
 "fields": [
 {
 "name": "USER_ID",
 "type": "string"
 },
 {
 "name": "ITEM_ID",
 "type": "string"
 },
 {
 "name": "TIMESTAMP",
 "type": "long"
 }
 ],
 "version": "1.0"
}

I want to remove a few fields and add new fields but not getting an idea of how to do that. Is there any way by which this can be done?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

2 Answers2

0

From Amazon documentation, which I suggest you take a look at.

Amazon Personalize recognizes three types of historical datasets. Each type has an associated schema with a name key whose value matches the dataset type. The three types are:

  • Users – This dataset is intended to provide metadata about your users. This might include information such as age, gender, or loyalty membership, which can be important signals in personalization systems.
  • Items – This dataset is intended to provide metadata about your items. This might include information such as price, SKU type, or availability.
  • Interactions – This dataset is intended to provide historical interaction data between users and items. It can also provide metadata on your user's browsing context, such as their location or device (mobile, tablet, desktop, and so on).

[...]

Before you add a dataset to Amazon Personalize, you must define a schema for that dataset. Each dataset type has specific requirements. Schemas in Amazon Personalize are defined in the Avro format. For more information, see Apache Avro.

Stefano
  • 453
  • 4
  • 18
  • Thanks for the reply and I understood that , is there any tutorial or something like that from where I can get idea how to create a totally new schema with different fields for aws-personalize. – Abhishek Awasthi Jul 13 '20 at 05:44
0

Schemas in Personalize are immutable. Therefore, if you want to add/change/remove a field in an existing schema, you must create a new schema with the desired schema defined using the Apache Avro format.

You can create a schema in the AWS console when you create a dataset or by using the CreateSchema API. Here is an example of creating a schema for an interactions dataset in Python.

import boto3
import json

personalize = boto3.client('personalize')

interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}


personalize.create_schema(
        name = "your-schema-name-here",
        schema = json.dumps(interactions_schema)
)

Note that there are required fields/columns for each dataset type. For example, every schema for the interactions dataset type must have USER_ID, ITEM_ID, and TIMESTAMP fields/columns. When naming fields/columns, the convention is to use constant case (i.e. uppercase with "_" to separate words) for field names in the schema and column names in your CSVs. Personalize will automatically map camelcase field names in PutEvents/PutItems/PutUsers API calls to their corresponding constant case in the schema. For example, eventType is automatically mapped to EVENT_TYPE. See the docs on datasets and schemas for details. There are also several examples of different schemas in the aws-samples/amazon-personalize-samples GitHub repository and the Personalize blogs.

James J
  • 621
  • 3
  • 6