6

I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true", but I'm pretty sure there are a ton of other options to tune the queries.

I couldn't find info for example, on what the "path" property does, so having the full list will be amazing.

I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs/forums.

Thanks!

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Laerion
  • 805
  • 2
  • 13
  • 18

2 Answers2

6

It seems you are using the Openx-JsonSerDe
http://docs.aws.amazon.com/athena/latest/ug/json.html

// properties used in configuration
public static final String PROP_IGNORE_MALFORMED_JSON = "ignore.malformed.json";
public static final String PROP_DOTS_IN_KEYS = "dots.in.keys";
public static final String PROP_CASE_INSENSITIVE ="case.insensitive" ;

https://github.com/rcongiu/Hive-JSON-Serde/blob/master/json-serde/src/main/java/org/openx/data/jsonserde/JsonSerDe.java

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • So there are only three properties? What about the "path" property that I saw in the Amazon docs, where is that? – Laerion May 24 '17 at 00:05
  • 1
  • Will be this one: http://docs.aws.amazon.com/athena/latest/ug/json.html – Laerion May 24 '17 at 16:24
  • Seems to me like a dummy properties that have no effect. – David דודו Markovitz May 24 '17 at 17:59
  • 3
    As of 9/2/18, the case insensitive property does not work as AWS does not have the version of Openx-JsonSerDe that implements this property. I can confirm that `dots.in.keys` works. Additionally, a missing feature not mentioned here is the ability to remap fields with name violations such as `foo-bar` by adding serde property `mapping.foo_bar = foo-bar` does work. The schema should then specify `foo_bar` as the column name and you should delete / re-add all partitions to pull the schema update. – waynethec Sep 03 '18 at 02:01
  • I couldn't work with caseInsensitive nor remapping in Athena + Glue... I agree with @DavidדודוMarkovitz that they seem dummy properties. Nevertheless, they seem to be in the AWS documentation... anybody could move on since 2018? – Juan Aguilar Guisado Aug 18 '22 at 11:45
3

As stated in release notes (see bullet #2 please), the JSON OpenX SerDe used in Athena has been improved. The improvements include, but are not limited to, the following:

  • Support for the ConvertDotsInJsonKeysToUnderscores property. When set to TRUE, it allows the SerDe to replace the dots in key names with underscores. For example, if the JSON dataset contains a key with the name "a.b", you can use this property to define the column name to be "a_b" in Athena. The default is FALSE. By default, Athena does not allow dots in column names.
  • Support for the case.insensitive property. By default, Athena requires that all keys in your JSON dataset use lowercase. Using WITH SERDE PROPERTIES ("case.insensitive"= FALSE;) allows you to use case-sensitive key names in your data. The default is TRUE. When set to TRUE, the SerDe converts all uppercase columns to lowercase.

For more information, see OpenX JSON SerDe in the Amazon Athena User Guide.