SparkSQL CLI 대신 Spark Thrift Server

Data Engineering 2024. 1. 15. 19:54

기본적으로 SparkSQL을 CLI(${SPARK_HOME}/bin/spark-sql)로 실행해왔다.

다른 방법으로 Thrift Server를 이용하여 JDBC/ODBC 접근이 가능하다.

사용이유

1. JDBC/ODBC사용하여 SparkSQL 사용가능.

2. DB 분석 툴이나 IDE 연계하여 사용가능함.

3. 동시에 여러 유저들이 사용가능함.

실행

기본 HMS가 RDB에 설정되어있어야한다. HMS 서버는 안 띄우더라도 HMS가 사용하던 DB가 있어야한다.

참고: https://hoony-612.tistory.com/82

1. 실행

//Default: jdbc:hive2://localhost:10000
${SPARK_HOME}/sbin/start-thriftserver.sh

//jdbc:hive2://localhost:9999
${SPARK_HOME}/sbin/start-thriftserver.sh\
	--hiveconf hive.server2.thrift.port=9999

2. Datagrip로 확인

show tables;

참고

https://cwiki.apache.org/confluence/display/hive/hiveserver2+overview

HiveServer2 Overview - Apache Hive - Apache Software Foundation

Introduction HiveServer2 (HS2) is a service that enables clients to execute queries against Hive. HiveServer2 is the successor to HiveServer1 which has been deprecated. HS2 supports multi-client concurrency and authentication. It is designed to provide b

cwiki.apache.org

https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html#running-the-spark-sql-cli

'Data Engineering' 카테고리의 다른 글

Spark + Iceberg - 3(Hidden Partitioning) (0)	2024.01.17
Spark + Iceberg - 2(Partition, Schema evolution) (0)	2024.01.16
Spark + Iceberg - 1(소개 및 연동) (0)	2024.01.16
Spark + S3 연동하기 (0)	2024.01.15
Hive Metastore & SparkSQL & Local FileSystem (0)	2024.01.14

ABOUT ME

CHALLENGE

사용이유

실행

'Data Engineering' 카테고리의 다른 글

티스토리툴바

ABOUT ME

사용이유

실행

'Data Engineering' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바