I am selecting a column data in Hive by using substring function and specifying the length of 3999. I am storing the selected columns in a blob file on Azure, and then trying to load the file into Azure DataWarehouse using Azure Data Factory. Now I am encountering errors for a few rows, where it says datalength has exceeded 3999 (MaxLength).
So for troubleshooting, I did substring of length 2000 in Hive and saved the data to file. This time, I did not receive any errors. However, when I checked the data in the column in DataWarehouse, some of the elements have length more than 2000. This is mostly happening due to data coming in chinese characters.