1

We are upgrading our Data pipeline version from 3.3.2 to 5.8, so those bootstrap actions on old AMI release have changed to be setup using configuration and specifying them under classification / property definition.

So my Json looks like below

  {
            "enableDebugging": "true",
            "taskInstanceBidPrice": "1",
            "terminateAfter": "2 Hours",
            "name": "ExportCluster",
            "taskInstanceType": "m1.xlarge",
            "schedule": {
                "ref": "Default"
            },
            "emrLogUri": "s3://emr-script-logs/",
            "coreInstanceType": "m1.xlarge",
            "coreInstanceCount": "1",
            "taskInstanceCount": "4",
            "masterInstanceType": "m3.xlarge",
            "keyPair": "XXXX",
            "applications": ["hadoop","hive", "tez"],
            "subnetId": "XXXXX",
            "logUri": "s3://pipelinedata/XXX",
            "releaseLabel": "emr-5.8.0",
            "type": "EmrCluster",
            "id": "EmrClusterWithNewEMRVersion",
            "configuration": [
                { "ref": "configureEmrHiveSite" }
            ]
        },
        {
            "myComment": "This object configures hive-site xml.",
            "name": "HiveSite Configuration",
            "type": "HiveSiteConfiguration",
            "id": "configureEmrHiveSite",
            "classification": "hive-site",
            "property": [
                {"ref": "hive-exec-compress-output" }
            ]
        },
        {
            "myComment": "This object sets a hive-site configuration 
             property value.",
            "name":"hive-exec-compress-output",
            "type": "Property",
            "id": "hive-exec-compress-output",
            "key": "hive.exec.compress.output",
            "value": "true"
        }
    ],
    "parameters": []

With the above Json file it gets loaded into Data Pipeline but throws an error saying

Object:HiveSite Configuration
ERROR: 'HiveSiteConfiguration'
Object:ExportCluster
ERROR: 'configuration' values must be of type 'null'. Found values of type 'null'

I am not sure what this really means and could you please let me know if i am specifying this correctly which i think i am according to http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Roshan
  • 905
  • 9
  • 21
Krish
  • 390
  • 4
  • 15
  • Were you able to successfully upgrade to 5.x? I specifically have a question about this step, without changing the default configuration. https://stackoverflow.com/questions/47858108/how-to-upgrade-data-pipeline-definition-from-emr-3-x-to-4-x-5-x – user1322092 Dec 21 '17 at 16:42

1 Answers1

0

The below block should have the name as "EMR Configuration" only then its recognized correctly by the AWS Data pipeline and the Hive-site.xml is being set accordingly.

   {
        "myComment": "This object configures hive-site xml.",
        "name": "EMR Configuration",
        "type": "EmrConfiguration",
        "id": "configureEmrHiveSite",
        "classification": "hive-site",
        "property": [
            {"ref": "hive-exec-compress-output" }
        ]
    },
Krish
  • 390
  • 4
  • 15