SparkSQL CLI 대신 Spark Thrift Server

Data Engineering

SparkSQL CLI 대신 Spark Thrift Server

HOONY_612 2024. 1. 15. 19:54

기본적으로 SparkSQL을 CLI(${SPARK_HOME}/bin/spark-sql)로 실행해왔다.

다른 방법으로 Thrift Server를 이용하여 JDBC/ODBC 접근이 가능하다.

사용이유

1. JDBC/ODBC사용하여 SparkSQL 사용가능.

2. DB 분석 툴이나 IDE 연계하여 사용가능함.

3. 동시에 여러 유저들이 사용가능함.

실행

기본 HMS가 RDB에 설정되어있어야한다. HMS 서버는 안 띄우더라도 HMS가 사용하던 DB가 있어야한다.

참고: https://hoony-612.tistory.com/82

1. 실행

//Default: jdbc:hive2://localhost:10000
${SPARK_HOME}/sbin/start-thriftserver.sh

//jdbc:hive2://localhost:9999
${SPARK_HOME}/sbin/start-thriftserver.sh\
	--hiveconf hive.server2.thrift.port=9999

2. Datagrip로 확인

show tables;

참고

https://cwiki.apache.org/confluence/display/hive/hiveserver2+overview

HiveServer2 Overview - Apache Hive - Apache Software Foundation

Introduction HiveServer2 (HS2) is a service that enables clients to execute queries against Hive. HiveServer2 is the successor to HiveServer1 which has been deprecated. HS2 supports multi-client concurrency and authentication. It is designed to provide b

cwiki.apache.org

https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html#running-the-spark-sql-cli