115

Is there a way to add a new column to existing table in DynamoDB in Amazon's AWS?

Google didn't help,

UpdateTable Query in http://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html?highlight=update%20table doesn't have any information related to adding a new column.

Arun Selva Kumar
  • 2,593
  • 3
  • 19
  • 30

6 Answers6

223

DynamoDB does not require schema definition, and so there is no such thing as a "column". You can just add a new item with a new attribute.

Julio Faerman
  • 13,228
  • 9
  • 57
  • 75
  • 14
    So, a user can just upload their "profile info" with an extra field, and dynamo will just accept it on the fly? Or do I have to go in as an admin and add it? – Costa Michailidis Nov 25 '16 at 21:54
  • 27
    @Costa as long as your code includes the new attribute in its PutItem call, DynamoDB will just accept it. It doesn't matter that no other record has that attribute. – osullic Jul 06 '17 at 09:51
  • how to remove an attribute from all documents? – Prashant Tapase Apr 24 '18 at 10:30
  • @prashant-tapase https://stackoverflow.com/questions/43468257/updating-multiple-records-in-dynamodb – Julio Faerman Apr 24 '18 at 16:17
  • I mean to say how to remove columns from the table which already exists or mistakenly added – Prashant Tapase Apr 25 '18 at 05:06
  • 2
    Not in a single operation, i believe you'd need to iterate and call UpdateItem for each – Julio Faerman Apr 26 '18 at 18:23
  • I searched about this because mine wasn't working, then I realized it was ***my code*** that wasn't working. – Rigo Sarmiento Dec 30 '19 at 00:32
  • 2
    @HrushikeshDhumal you might not be seeing the new field in the UI because you aren't querying for it. Dynamodb will store data with new attributes in the item - there is no need of defining attributes other than the hash key (and sort key if present). – bschandramohan Feb 18 '20 at 19:15
  • 1
    Well technically you need to define the schema for the table. However the minimum number of fields( not columns) is for the keys for the table and the key fields you need for your global secondary indexes. You cannot unset these fields. Other than that you can apply any number of fields on any item as long as total size is in 400 KB – anand Aug 18 '21 at 15:42
18

Well, let's not get dragged away in the semantic discussion about the difference between "fields" and "columns". The word "column" does remind us of relational databases, which dynamodb is not. In essence that means that dynamodb does not have foreign keys.

Dynamodb does have "primary partition keys" and "index partition keys" though, just as with relational databases. (Although there are other strategies as well)

You do need to respect those keys when you add data. But aside from those requirements, you don't have to predefine your fields (except for those partition keys mentioned earlier).

Assuming that you are new to this, some additional good practices:

  • Add a numeric field to each record, to store the time of creation in seconds. Dynamodb has optional cleaning features, which require this type of field in your data.
  • You cannot use dates in dynamodb, so you have to store those as numeric fields or as strings. Given the previously mentioned remark, you may prefer a numeric type for them.
  • Don't store big documents in it, because there is a maximum fetch size of 16MB, and a maximum record size of 400KB. Fortunately, AWS has S3 storage and other kind of databases (e.g. DocumentDB).

There are many strategies for table keys:

  • If you only declare the partition-key, then it acts like a primary key (e.g. partition-key=invoiceId). That's fine.
  • If your object has a parent reference. (e.g. invoices have a customer), then you probably want to add a sort-key. (e.g. partition-key=customerId;sort-key=invoiceId) Together they behave like a composed key. The advantage is that you can do a lookup using both keys, or just using the partition-key. (e.g. request a specific invoice for a specific customer, or all invoices for that customer)
bvdb
  • 22,839
  • 10
  • 110
  • 123
  • Any link to read more about store the time of creation in seconds? thanks. – Juanito Fatas Feb 27 '21 at 02:42
  • @JuanitoFatas https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html ; in javascript you could do `const ttl = Math.floor((new Date().getTime()) / 1000)` for instance. – bvdb Feb 27 '21 at 22:14
  • @JuanitoFatas how about `Date.now()` – Subash Jun 30 '21 at 10:25
  • According to the documentation, it should be in seconds though. So you would have to `Math.floor(Date.now()/1000)` – bvdb Jun 30 '21 at 13:15
  • Added some more info about table keys, which is something most beginners aren't ready for. But sooner or later, you'll NEED it. :-) – bvdb Aug 05 '22 at 09:48
3

You can achieve the same by doing the following,

  1. Open Dynamodb and click on the tables option in the left sidebar menu.
  2. Search your table by name and click on your table
  3. Now select the orange button named Explore table items
  4. Scroll down and Click on Create item
  5. Now you will see an editor with JSON Value, Click on Form button on the right side to add a new column and its type.

Note: this will insert 1 new record and you can see now the new column as well.

Manoj Selvin
  • 2,247
  • 24
  • 20
2

I installed NoSQL Workbench then connected to existing DynamoDB Table and tried to update existing Item by adding a new attribute. I figured out that we can only add a new attribute with one of these types - "SS", "NS", "BS" (String Set, Number Set, Binary Set"). In Workbench, we can generate code for the chosen operation.

enter image description here

I scanned my dynamodb Table and for each item added new attribute with type "SS" then I scanned again and updated recently added new attribute to type - "S" in order create a global secondary index (GSI) with a primary key - "pk2NewAttr".

NoSQL Workbench related video - https://www.youtube.com/watch?v=Xn12QSNa4RE&feature=youtu.be&t=3666

Example in Python "how to scan all dynamodb Table" - https://gist.github.com/pgolding

SAndriy
  • 670
  • 2
  • 15
  • 25
0

A way to add a new column to existing table in DynamoDB in Amazon's AWS:

We can store the values in DynamoDb in 2 ways, (i) In an RDBMS Type of Structure for the DynamoDB, we can add a new Coulmn by executing the same command keeping the "new Column" entry within which the Records in the Existing Table has been created. we can use DynamoDb with the Records/ Rows having Values for certain Columns while other columns does not have Values.

(ii) In a NoSQL kind of Structure; where we store a Json String within a Column to keep all the attributes as per the Requirement. Here we are generating a json string and we have to add the new Attribute into the json String which can then be inserted into the same Column but with the new Attribute.

user11677199
  • 119
  • 3
0

This script will either delete a record, or add a ttl field. You might want to tailor it to your column name and remove the delete stuff.

Usage:

usage: add_ttl.py [-h] --profile PROFILE [-d] [--force_delete_all] [-v] [-q]

Procedurally modify DynamoDB

optional arguments:
  -h, --help          show this help message and exit
  --profile PROFILE   AWS profile name
  -d, --dryrun        Dry run, take no action
  --force_delete_all  Delete all records, including valid, unexpired
  -v, --verbose       set loglevel to DEBUG
  -q, --quiet         set loglevel to ERROR

Script:

#!/usr/bin/env python3
# pylint:disable=duplicate-code

import argparse
import logging
import sys
from collections import Counter
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from functools import cached_property
from typing import Dict, Optional

import boto3
from dateutil.parser import isoparse
from tqdm import tqdm

LOGGER = logging.getLogger(__name__)

DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
LOG_FORMAT = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"


def setup_logging(loglevel=None, date_format=None, log_format=None):
    """Setup basic logging.

    Args:
      loglevel (int): minimum loglevel for emitting messages
    """

    logging.basicConfig(
        level=loglevel or logging.INFO,
        stream=sys.stdout,
        format=log_format or LOG_FORMAT,
        datefmt=date_format or DATE_FORMAT,
    )


def parse_args():
    """
    Extract the CLI arguments from argparse
    """
    parser = argparse.ArgumentParser(description="Procedurally modify DynamoDB")

    parser.add_argument(
        "--profile",
        help="AWS profile name",
        required=True,
    )

    parser.add_argument(
        "-d",
        "--dryrun",
        action="store_true",
        default=False,
        help="Dry run, take no action",
    )

    parser.add_argument(
        "--force_delete_all",
        action="store_true",
        default=False,
        help="Delete all records, including valid, unexpired",
    )

    parser.add_argument(
        "-v",
        "--verbose",
        dest="loglevel",
        help="set loglevel to DEBUG",
        action="store_const",
        const=logging.DEBUG,
    )

    parser.add_argument(
        "-q",
        "--quiet",
        dest="loglevel",
        help="set loglevel to ERROR",
        action="store_const",
        const=logging.ERROR,
    )

    return parser.parse_args()


def query_yes_no(question, default="yes"):
    """Ask a yes/no question via input() and return their answer.

    "question" is a string that is presented to the user.
    "default" is the presumed answer if the user just hits <Enter>.
        It must be "yes" (the default), "no" or None (meaning
        an answer is required of the user).

    The "answer" return value is True for "yes" or False for "no".
    """
    valid = {"yes": True, "y": True, "ye": True, "no": False, "n": False}
    if default is None:
        prompt = " [y/n] "
    elif default == "yes":
        prompt = " [Y/n] "
    elif default == "no":
        prompt = " [y/N] "
    else:
        raise ValueError("invalid default answer: '%s'" % default)

    while True:
        sys.stdout.write(question + prompt)
        choice = input().lower()
        if default is not None and choice == "":
            return valid[default]

        if choice in valid:
            return valid[choice]

        sys.stdout.write("Please respond with 'yes' or 'no' " "(or 'y' or 'n').\n")


@dataclass
class Table:
    """Class that wraps dynamodb and simplifies pagination as well as counting."""

    region_name: str
    table_name: str
    _counter: Optional[Counter] = None

    def __str__(self):
        out = "\n" + ("=" * 80) + "\n"
        for key, value in self.counter.items():
            out += "{:<20} {:<2}\n".format(key, value)
        return out

    def str_table(self):
        keys = list(self.counter.keys())
        # Set the names of the columns.
        fmt = "{:<20} " * len(keys)

        return f"\n\n{fmt}\n".format(*keys) + f"{fmt}\n".format(
            *list(self.counter.values())
        )

    @cached_property
    def counter(self):
        if not self._counter:
            self._counter = Counter()
        return self._counter

    @cached_property
    def client(self):
        return boto3.client("dynamodb", region_name=self.region_name)

    @cached_property
    def table(self):
        dynamodb = boto3.resource("dynamodb", region_name=self.region_name)
        return dynamodb.Table(self.table_name)

    @property
    def items(self):
        response = self.table.scan()
        self.counter["Fetched Pages"] += 1
        data = response["Items"]

        with tqdm(desc="Fetching pages") as pbar:
            while "LastEvaluatedKey" in response:
                response = self.table.scan(
                    ExclusiveStartKey=response["LastEvaluatedKey"]
                )
                self.counter["Fetched Pages"] += 1
                data.extend(response["Items"])
                pbar.update(500)

        self.counter["Fetched Items"] = len(data)
        return data

    @cached_property
    def item_count(self):
        response = self.client.describe_table(TableName=self.table_name)
        breakpoint()
        count = int(response["Table"]["ItemCount"])
        self.counter["Total Rows"] = count
        return count


def delete_item(table, item):
    return table.table.delete_item(
        Key={
            "tim_id": item["tim_id"],
        }
    )


def update_item(table: Table, item: Dict, ttl: int):
    return table.table.update_item(
        Key={"tim_id": item["tim_id"]},
        UpdateExpression="set #t=:t",
        ExpressionAttributeNames={
            "#t": "ttl",
        },
        ExpressionAttributeValues={
            ":t": ttl,
        },
        ReturnValues="UPDATED_NEW",
    )


def main():
    setup_logging()
    args = parse_args()

    if not query_yes_no(
        f"Performing batch operations with {args.profile}; is this correct?"
    ):
        sys.exit(1)

    sys.stdout.write(f"Setting up connection with {args.profile}\n")

    boto3.setup_default_session(profile_name=args.profile)

    table = Table(region_name="us-west-2", table_name="TimManager")

    now = datetime.utcnow().replace(microsecond=0).astimezone(timezone.utc)
    buffer = timedelta(days=7)

    # @TODO list comprehension
    to_update = []
    to_delete = []

    for item in tqdm(table.items, desc="Inspecting items"):
        ttl_dt = isoparse(item["delivery_stop_time"])
        if ttl_dt > now - buffer and not args.force_delete_all:
            to_update.append(item)
        else:
            to_delete.append(item)

    table.counter["Identified for update"] = len(to_update)
    table.counter["Identified for delete"] = len(to_delete)
    table.counter["Performed Update"] = 0
    table.counter["Performed Delete"] = 0

    if to_update and query_yes_no(
        f"Located {len(to_update)} records to update with {args.profile}"
    ):
        for item in tqdm(to_update, desc="Updating items"):
            if not args.dryrun:
                ttl_dt = isoparse(item["delivery_stop_time"])
                response = update_item(table, item, int((ttl_dt + buffer).timestamp()))
                if response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 200:
                    table.counter["Updated"] += 1

    if to_delete and query_yes_no(
        f"Located {len(to_delete)} records to delete with {args.profile}"
    ):
        for item in tqdm(to_delete, desc="Deleting items"):
            if not args.dryrun:
                table.counter["Deleted"] += 1
                response = delete_item(table, item)
                if response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 200:
                    table.counter["Deleted"] += 1

    sys.stdout.write(str(table))


if __name__ == "__main__":
    main()
Ashton Honnecke
  • 688
  • 10
  • 16