數(shù)據(jù)表是存儲數(shù)據(jù)的基本單位,Hive數(shù)據(jù)表主要分為內部表(又叫托管表)和外部表,以內部表和外部表為基礎可以創(chuàng)建分區(qū)表或分桶表,即內/外部分區(qū)表或內/外部分桶表。接下來,針對內部表和外部表進行詳細講解。
默認情況下,內部表和外部表的數(shù)據(jù)都存儲在Hive配置文件中參數(shù)hive.metastore.warehouse.dir指定的路徑。它們的區(qū)別在于刪除內部表時,內部表的元數(shù)據(jù)和數(shù)據(jù)會一同刪除;而刪除外部表時,只刪除外部表的元數(shù)據(jù),不會刪除數(shù)據(jù)。外部表相對來說更加安全,數(shù)據(jù)組織更加靈活并且方便共享源數(shù)據(jù)文件。
CREATE TABLE IF NOT EXISTS
hive_database.managed_table(
staff_id INT COMMENT "This is staffid",
staff_name STRING COMMENT "This is staffname",
salary FLOAT COMMENT "This is staff salary",
hobby ARRAY<STRING> CONMENT "This is staff hobby",
deduct(hps MAP<STRING, FLOAT> COMMENT "This is staff deduction",
address STRUCT<street:STRING,city:STRING> COMMENT "This is staff address"
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY
MAP KEYS TERMINMTED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
TBLPROPERTIES("comment"="This is a managed table");
上述命令中,指定ROW FORMAT DELIMITED子句使用Hive內置的Serde,自定義字段(FIELDS)分隔符為“,”;自定義集合元素(COLLECTION ITEMS)的分隔符為“_”;自定義MAP(MAP KEYS)的鍵值對分隔符為“:”;自定義行(LINES)分隔符為\n。
(2)創(chuàng)建外部表external_table的命令如下。
CREATE TABLE IF NOT EXISTS
hive_database.managed_table(
staff_id INT COMMENT "This is staffid",
staff_name STRING COMMENT "This is staffname",
salary FLOAT COMMENT "This is staff salary",
hobby ARRAY<STRING> CONMENT "This is staff hobby",
deduct(hps MAP<STRING, FLOAT> COMMENT "This is staff deduction",
address STRUCT<street:STRING,city:STRING> COMMENT "This is staff address"
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY
MAP KEYS TERMINMTED BY ':'
LINES TERMINATED BY '\n'
STORED AS textfile
TBLPROPERTIES("comment"="This is a managed table");
上述命令中,通過在CREATETABLE句式中指定EXTERNAL子句創(chuàng)建外部表。創(chuàng)建外部表時通常配合LOCATION子句指定數(shù)據(jù)的存儲位置,便于數(shù)據(jù)的維護與管理。