18

I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:

hive> create table test_hive (id int,value string);

and my flat file is like this: input.txt

1   a
2   b
3   c
4   d
5   e
6   F
7   G
8   j

when I am running the below commands I am getting null values:

hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL

screen shot:

hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
NULL    NULL
Time taken: 0.182 seconds
Toon Krijthe
  • 52,876
  • 38
  • 145
  • 202
user1823697
  • 189
  • 1
  • 1
  • 3
  • 2
    Probably you need to specify how rows/columns are delimited in your input file when loading into a Hive table. You may try something like: `create table test_hive (id int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/user/hadoop/hive/input';` – Lorand Bendig Nov 14 '12 at 13:26
  • Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter. So to resolve your problem, you can recreate the table mentioning the below syntax and it would work. CREATE TABLE test_hive(id INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; – Mufaddal Kamdar Nov 25 '15 at 17:30

6 Answers6

23

The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.

Similar to what Lorand Bending pointed in the comment, use:

CREATE TABLE test_hive(id INT, value STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';

You don't need to specify a location since you are creating a managed table (and not an external table).

Mark Grover
  • 4,070
  • 22
  • 21
6

Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter.

So to resolve your problem, you can recreate the table mentioning the below syntax and it would work.

CREATE TABLE test_hive(id INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';

Mufaddal Kamdar
  • 249
  • 4
  • 11
  • In the Apache HiveServer2 client example they use a "Ctrl-A" delimited file (a.txt) to load data into table. First of all they don't include this file and second I wish they where providing more info about that. – kostia Mar 24 '17 at 08:32
3

The solution is quite simple. The Table wan't created in the right way.

Simple solution for your problem or any further problems is knowing how to load the data.

CREATE TABLE [IF NOT EXIST] mytableName(id int,value string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '/t'

STORED AS TEXTFILE ;

Now lemme explain the code :

  1. First Line Creating your table. The [IF NOT EXIST] is optional that tells if the table exist don't overwrite it. Its more of safety measure.

  2. Second line Specifies a delimiter at the table level for structured fields.

  3. Third Item You can include any single character, but the default is '\001'. '/t' is for a tab space : in your case '|' is for data which are beside each other and separated by | ' ' for one char space. And so on...

  4. Forth Line : Specifies the type of file in which data is to be stored. The file can be a TEXTFILE, SEQUENCEFILE, RCFILE, or BINARY SEQUENCEFILE. Or, how the data is stored can be specified as Java input and output classes.

when loading Locally :

LOCD DATA LOCAL INPATH '/your/data/path.csv' [OVERWRITE] INTO TABLE myTableName;

Always try checking your data by a simple select* statement.

Hope it helps.

Community
  • 1
  • 1
user 451
  • 462
  • 6
  • 8
1

Hive’s default record and field delimiters list:

  1. \n

  2. ^A

  3. ^B

  4. ^C

press ^V^A could insert a ^A in Vim.

songhir
  • 3,393
  • 3
  • 19
  • 27
1

The elements are separated by space or tab? Let it's tab follow these steps. If separated space use ' ' instead of '\t' Ok.

hive> CREATE TABLE test_hive(id INT, value STRING) row format
   delimited fields terminated by '\t' line formated by '\n' stored as filename;

Than you have to enter

hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;

hive> select * from test_hive;

Now you will get exact your expected output "filename".

Community
  • 1
  • 1
Venu A Positive
  • 2,992
  • 2
  • 28
  • 31
1

please check the dataset date column it should follow the date format yyyy-mm-dd If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned. Hive Official documentation

Shekh Firoz Alam
  • 192
  • 3
  • 15