I am taking a mooc.
it told us to upload a few files from our PC to hdfs using below commands
azure storage blob upload local_path container data/logs/2008-01.txt.gz
I did the same. later on when I typed below command in PUTTY secure shell I was able to see that file
hdfs dfs -ls /data/logs
Found 6 items
-rwxrwxrwx 1 331941 2016-03-03 15:56 /data/logs/2008-01.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-02.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-03.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-04.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-05.txt.gz
-rwxrwxrwx 1 331941 2016-03-03 15:58 /data/logs/2008-06.txt.gz
then we started a hive
terminal and first created a table and then inserted data into that table using
load data inpath '/data/logs' into TABLE rawlog;
Then we created an external table using below command
CREATE EXTERNAL TABLE cleanlog
(log_date DATE,
log_time STRING,
c_ip STRING,
cs_username STRING,
s_ip STRING,
s_port STRING,
cs_method STRING,
cs_uri_stem STRING,
cs_uri_query STRING,
sc_status STRING,
sc_bytes INT,
cs_bytes INT,
time_taken INT,
cs_user_agent STRING,
cs_referrer STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE LOCATION '/data/cleanlog';
we inserted data into the table using
INSERT INTO TABLE cleanlog
SELECT *
FROM rawlog
WHERE SUBSTR(log_date, 1, 1) <> '#';
I exited out of hive and typed in below command
hdfs dfs -ls /data/logs
- I dont see anything in that folder, why? where did uploaded log files go?
- Where is the rawlog table? does it exist in the same folder? Why dont i see it?
Why do i see file 00000_0 in my cleanlog folder? is it the new table? If i type command
hdfs dfs -ls /data/cleanlog
The output that i get is
Found 1 items
-rwxr-xr-x 1 sshuser supergroup 71323206 2016-03-03 16:11 /data/cleanlog/000000_0
################----------------------------------update 1
- What would happen if load one more data file at
/data/logs/
and then runselect * from rawlog
? would it automatically pull data from the new file?