What is Hive?
- Hive is a datawarehousing package built on top of Hadoop
- Mainly used for data analysis
- For managing and querying structured data
- No need to learn Java and APIs and targeted towards SQL developers
- Similar to SQL and known as HiveQL
- Allows programmers to plug in custom mappers and reducers
- Provides tool to enable easy data ETL
Where to use Hive?
- Log Processing
- Data mining
- Document Indexing
- Client facing BI
- Hypothesis Testing
Why to use Hive when Pig is there?
Hive | Pig |
HiveQL is used | pigLatin is used |
Used by Analyst to generate daily report | Used by programmer and researchers |
Declarative language like SQL | Procedural data flow language |
Select * from <table_name>; | X= load ‘mytestdata’; Dump X; |
SQL like language | PigLatin language |
Supports Explicit Schema | Supports Implicit Schema |
Supports Partition | Does not support Partition |
Supports Web Interface | Does not support Web Interface |
Hive Architecture:
Hive Components:
- Shell
- Metastore
- Driver
- Compiler
- Execution Engine
Limitations of Hive:
- Hive is not designed for Online transaction processing
- Hive does not offer row level queries and row level updates
- Latency for Hive queries are very high
Features of Hive Query Language:
- Filter rows using where clause
- Store results of a query in another table
- Able to manage tables and partitions
- Store the result of a query in Hadoop DFS
- Ability to do equijoin
Hive supports below primitive/complex types.
- Primitive Types
- Boolean
- Integer
- Float/Double
- String type
- Composite Type
- Struts
- Maps
- Arrays
Hive data models:
- Database: Namespace
- Table: Schemas in namespace
- Partitions: How data is stored in HDFS
- Buckets or clusters: Partitions are divided in to buckets
Create Database and Use database:
>
Create Database <database_name>;
>
use <database_name>;
Create Table:
>
Create table <table_name>(col_name1 datatype,col_name2 datatype..) row format delimited fields terminated by ',' stored as textfile;
External Table:
- Create table in another HDFS location. Hive does not delete the table even when the tables are dropped
Syntax:
CREATE EXTERNAL TABLE <TABLE_NAME>(<HDFS_LOC> STRING) LOCATION '/USER/ROOT/EXTERNAL_TABLE';
© 2015, https:. All rights reserved. On republishing this post, you must provide link to original post