Short answer: Hive is not a data storage, it can query data from storage using Tables (schema definition, SerDe
used for data serialization/deserialization and data location are defined in create-table statement).
Long answer:
Data are stored in HDFS
or other Hadoop compatible filesystem like S3
(can be completely separated from Hadoop cluster).
Hive is a database: it has rich SQL(DDL and DML), metadata which includes statistics and tables definitions, access grants, cost-based optimizer and can use different query engines: MR
(MapReduce) and Tez
. The difference between Hive and traditional RDBMS is that Hive uses schema-on-read concept: how data are being stored and how it is being read are completely disconnected, schema applied when data is being read, data files can be added by some external process into HDFS.
Hive can read different structured files (like JSON, Avro, CSV, Parquet, ORC, etc) as well as semi-structured files (using RegexSerDe or any other, even custom SerDe). Also Hive can connect to other JDBC sources for easy integration and read/write to them.
In Hive, table or partition is a location in HDFS in which data files are being stored + metadata containing schema definition, SerDe, statistics, access grants.
You can create table on top of some existing location, and even many tables (even with different schema) on top of the same location. Read this answer about multiple tables on top of the same location and this answer: https://stackoverflow.com/a/54242477/2700344 about managed/external tables.
You can put files directly into table location or remove files using HDFS commands and it will be reflected in dataset returned by Hive, also LOAD INTO TABLE
command is also supported, it will put files into table location for you and you do not need to know the location path.