Spark Hive Table Not Found

Written in front of the words, learning Hive so long and found that there is no complete domestic introduction Hive books, and the Internet above the information is messy, so I decided to write some of the "Hive those things" sequence article, shared to everyone. P re-process the data to cleanse commas from the data or disable Native Spark Modeling for such data sets. Hive is case insensitive, while Parquet is not; Hive considers all columns nullable, while nullability in Parquet is significant; Due to this reason, we must reconcile Hive metastore schema with Parquet schema when converting a Hive metastore Parquet table to a Spark SQL Parquet table. Let's check that you have access to content of access log table. But I would suggest you to connect Spark to HDFS & perform analytics over the stored data. 知识点3:Spark访问与Hbase关联的Hive表. It seems that with this new version the Spark SQL Thrift server is no more able to connect to the Hive metastore, we find these strange messages in log file:. These are the default (internal) tables of Hive. It is implemented using tables in a relational database. In the last few weeks, I had been working on a Cloudera Manager issue that Cloudera Manager server itself was very slow in response to normal API calls, like CM API login and BDR (Back Disaster & Recovery) jobs. xml, the context automatically creates `metastore_db` // in the current directory and creates a directory configured by `spark. What should I do to run my program smoothly? I am using eclipse neon and Spark 2. Query a HBASE table through Hive using PySpark on EMR October 15, 2019 Gokhan Atil AWS , Big Data hbase , hive , spark In this blog post, I’ll demonstrate how we can access a HBASE table through Hive from a PySpark script/job on an AWS EMR cluster. Hive Installation must be completed successfully. [jira] [Closed] (KYLIN-1813) intermediate table in Hive not cleaned up nichunen (Jira) [jira] [Created] (KYLIN-4189) Prevent abnormal CPU usage by limiting lookup table's rows Temple Zhou (Jira) [GitHub] [kylin] nichunen merged pull request #868: Fix KYLIN-4147 User has project's admin permission but doesn't have permission to see the Storage. Hive's Limitations Hive is a pure data warehousing database which stores data in the form of tables. Similarly let's create a table structure in ORACLE in the meantime. How is hive-site. When possible, Spark transparently converts such Hive tables to DataSource format in order to take advantage of IO performance improvements in Spark SQL. Run the following the script to generate a text file from all_objects view. Message view « Date » · « Thread » Top « Date » · « Thread » From: sandeep vura Subject: Re: Creating hive table on spark ((ERROR. Setting up Hadoop integration; Connecting to secure clusters; Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS) DSS and Hive; DSS and Impala. For all other Hive versions, Azure Databricks recommends that you download the metastore JARs and set the configuration spark. That means, some very common syntax are not supported by sqlcontext. thanks for the reference bug, I started realizing the Ser de registration issues with Hive Metastore. It can be used like %hive(${prefix}). Jcseg分词器的实现详解. I'm running the prebuilt version of Spark 1. Here we use hive. Find out the results, and discover which option might be best for your enterprise. I did a bit of Googling and found this " The YARN Container Memory was smaller than the Spark Executor requirement. Hi Zhan Zhang, With the pre-bulit version 1. Also make sure you have hive-site. xml loaded?. Hive datasets; Multiple Hadoop clusters; Dynamic AWS EMR clusters; Hadoop multi-user security; Distribution-specific notes. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor. Applies to: Big Data Appliance Integrated Software - Version 4. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Not being able to find a suitable tutorial, I decided to write one. Schema-RDDs provide a single interface for efficiently working with structured data, including Apache Hive tables, parquet files and JSON files. Note that each internal key starts with the prefix rapidminer. Try Prime Hello, Sign in Account & Lists Sign in Account. First I created an EMR cluster (EMR 5. What is Bucketing and Clustering in HIVE? Bucketing and Clustering is the process in Hive, to decompose table data sets into more manageable parts. AnalysisException: Table or view not found: `traintext`. ] table_name Drop a table and delete the directory associated with the table from the file system if this is not an EXTERNAL table. hql("drop table if exists abc") where abc is a non-exist table I still received an. Physically, each bucket is just a file in the table directory. In order to schedule the process tu run, we use spark-submit, eg:. Unit Testing Apache Spark Applications using Hive Tables. An alternative approach to unloading the entire DynamoDB table into S3 is to run queries over it directly, using DynamoDB's Hive integration. for spark app, remove unnecessary variables and tune the GC. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor. Click through for a tutorial on using the new MongoDB Connector for Apache Spark. 2 sql / spark. In our previous blog, we have discussed the introduction to Apache Hive and Hive Data types in detail. Since all my clusters have Hive services running on separate nodes than Spark services, I would not know if this is needed in cases when Hive services and Spark services are on the same node. For example, Kylin 2. Handling of updated Hive tables. The data flow can be seen as follows: Docker. So I did a test to change to '/tmp/mapr' and in /tmp/mapr i also. I have copied hive-site. // When not configured by the hive-site. Setting up Hadoop integration; Connecting to secure clusters; Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS) DSS and Hive; DSS and Impala. Execute this command in Hive console to populate. Hope you must have some idea about Hive ORC tables. gatorsmile changed the title [SPARK-19120] [SPARK-19121] Refresh Metadata Cache After Loading Hive Tables [SPARK-19120] Refresh Metadata Cache After Loading Hive Tables Jan 13, 2017 gatorsmile added 2 commits Jan 13, 2017. ] table_name Drop a table and delete the directory associated with the table from the file system if this is not an EXTERNAL table. The Spark interpreter can be configured with properties provided by Zeppelin. Hive Metastore: In Hive, metastore is the central repository to store metadata for hive tables/partitions. If you have both PolyBase and Hadoop/Spark it is not an either/or question. xml to Spark conf/. Spark streaming hive table keyword after analyzing the system lists the list of keywords related and the list of websites with related content, in addition you can see which keywords most interested customers on the this website. Will the insertion be faster in datasource table than hive table as the datasource table does not have to do the copying files from staging to final Use spark's. Is there anyone make the query join different data sources work? especially Join hive table with other data sources. DSS does not have “Hive-only datasets”, and accessing Hive tables as SQL datasets using “Other SQL databases” option is not supported. Connecting to Oracle using Apache Spark; Inserting hive data into Oracle tables using Spark and Scala. DataStax 5. The Hive "create table" statement will include the header information as a data row. types import * spark = SparkSession\. 0 comes pre-packaged and integrated with Cassandra 3. Recently we found an issue with use of ANALYZE table queries inside Hive, where analyze command was changing 'LOCATION' property of random partitions in a table to point to another database/table. Spark, Scala & Hive Sql simple tests. 1 have a Hive table for the data in Cluster-1. Spark Project Hive Thrift Server Last Release on Aug 31, 2019 17. Please use (b, c) as the partition columns; 2. Collecting column level statistics for non compatible hive tables using. Did you use the Spark SQL node when you say. The Spark SQL implementation is almost fully compatible with Spark SQL except some tweaky cases. 6 behavior regarding string literal parsing. 1很多类的方法接口都有调整,很难升级,所以遇到这个问题只能通过修改spark里Hive. Some questions I came up with while trying to spin up the cluster:. Purpose tHiveInput extracts data from Hive and sends the data to the component that follows. Not being able to find a suitable tutorial, I decided to write one. saveAsTable method are not compatible with Hive Writing a DataFrame directly to a Hive table creates a table that is not compatible with Hive; the metadata stored in the metastore can only be correctly interpreted by Spark. 0, the spark-shell creates a SparkSession (spark). If a table with the same name already exists in the database, an exception is thrown. 0, all identifiers specified within backticks are treated literally when the configuration parameter hive. What should I do to run my program smoothly? I am using eclipse neon and Spark 2. It could be map reduce, spark, pig including hive. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. We need to load that on daily basis to Hive. x* on top of Vora 2. Do not worry about using a different engine for historical data. This process needs to access Hive tables, but is unable to find them. 2 sql / spark. ) The instructions here are for Spark 2. Join Starbucks, WeWork, Uber, The Economist, and thousands of businesses moving faster with Hive’s powerful project management and collaboration software. As a result, it can only process structured data read and written using SQL queries. I can easily reproduce the issue by opening Spark shell and trying to query a table. The names of the arguments to the case class are read using reflection and become the names of the columns. DynamoDB + Hive/Spark. 1 have a Hive table for the data in Cluster-1. I've set the partition mode to nonstrict in hive but spark is not seeing it 2 Answers connect r jdbc to hive databricks 0 Answers Unable to add files to DBFS or unable to create DBFS tables 1 Answer Incremental delta loads into HIVE using JDBC to source 3 Answers. You can find below a description of the dataset. from pyspark. 0: hive table: fail, SparkException: Requested partitioning does not match the src table. And what is more important, you can execute Spark SQL queries on Hive data. xml to the conf/ directory. 1, with Hive metastore configured on Postgres DB, we tried to upgrade Spark to the latest released version 1. Table as RDD. Also be careful when creating a table in Hive that the data does not contain a header row with the column names. Drop Table Statement. And what is more important, you can execute Spark SQL queries on Hive data. Hive does not support concurrently writing to tables. Spark SQL doesn't support buckets yet. nodemanager. – Limited integration with Spark – Hive optimizer not designed for Spark • Spark SQL reuses parts of Shark, – Hive data loading – In-memory column store • Spark SQL also adds – RDD-aware optimizer – Rich language interfaces 35. pyspark will launch us into a SparkSession automatically, which we can access from the spark variable and immediately start sending queries to our tables: spark. Once spark has parsed the flume events the data would be stored on hdfs presumably a hive warehouse. of buckets is mentioned while creating bucket table. But Hadoop does not need to be running to use Spark with Hive. Online Hadoop Projects -Solving small file problem in Hadoop In this hadoop project, we are going to be continuing the series on data engineering by discussing and implementing various ways to solve the hadoop small file. I however can see the same tables and query them through spark-shell. can any one please tell me how to create permanent tables in spark-sql which will be available for all session. Not sure what you're targeting at. Best, Björn. Is this true?. 为了让spark hive的功能可以在 yarn cluster模式下运行。 花了点时间仔细看了一下他的配置. If Hive dependencies can be found on the classpath, Spark will load them automatically. xml to Spark conf/. Hive does not support concurrently writing to tables. 563 seconds spark-sql>. Spark; SPARK-21554; Spark Hive reporting pyspark. Handling of updated Hive tables. In part one, we introduced Hadoop and. Spark SQL queries not running with the correct user id, Hive DoAs. `orders`;' It is accessing a hive table called. sql import * from pyspark. 0 and later Linux x86-64 Symptoms. I've found the solution (since question is marked as on hold, I'm posting it here in case anyone has the same problem): The reason was lack of Hive dependency jars in Spark classpath - looks like it silently fails if they're not there. Skip to main content. Create a table using a data source. 2-virtualbox and I am trying to debug spark-hive program using eclipse. This day is when the most orders were placed. 13: do not set spark. I recently benchmarked Spark 2. I also assume hive on spark could get me to near-real-time capabilities for large queries. Also be careful when creating a table in Hive that the data does not contain a header row with the column names. In this spark-shell, you can see spark already exists, and you can view all its attributes. This does not work for large tables, because downloads all the data into KNIME and then uploads it again. Cloudera Manager automatically sets this to the configured MapReduce or YARN service and the configured Spark service. To learn more or change your cookie settings, please read our Cookie Policy. Analyse Tweets using Flume, Hadoop and Hive Note : Also don't forget to do check another entry on how to get some interesting facts from Twitter using R here. To serialize/deserialize data from the tables defined in the Glue Data Catalog, Spark SQL needs the Hive SerDe class for the format defined in the Glue Data Catalog in the classpath of the spark job. beeline -e "create database if not exists newdb"; schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark # Now move the table to target db under the spark catalog. Spark; SPARK-21554; Spark Hive reporting pyspark. Spark の分散処理と Hive テーブルのバケット分割を両方利用したいときは何か対策が必要です。 [table “19” not found /]. We have a a Oracle table with 400 million records. SocketChannel (HDFS) It may be because of the GC problem of the app running on the hadoop cluster. xml into Spark conf folder. Spark streaming app will parse the data as flume events separating the headers from the tweets in json format. More information about this addition can be found here. This day is when the most orders were placed. The following features are not supported: Databricks Connect. The more basic SQLContext provides a subset of the Spark SQL support that does not depend on Hive. What is Bucketing and Clustering in HIVE? Bucketing and Clustering is the process in Hive, to decompose table data sets into more manageable parts. Bucket the data with Hive. Q: Can I write to a table from two clusters concurrently. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor. These files can be accessed by Hive tables using a SerDe that is part of Copy to Hadoop. There are two types of tables in Hive basically. A quick fix would be to start the Hive server from the same directory as spark-submit and re-create your tables. Is it possible to access the hive tables created within Databricks from connect? I'm currently using VS Code and have be able to successfully execute simple applications. 2-virtualbox and I am trying to debug spark-hive program using eclipse. When using the metastore provided by Databricks, you should use the default configurations on the cluster for the Hive metastore. The table is never found. If you are not using an external metastore, ensure that you do not have any custom Hive metastore configurations on your cluster. schematool -moveTable table1 -fromCatalog hive -toCatalog spark -fromDatabase db1. This gives you more flexibility in configuring the thrift server and using different properties than defined in the spark-defaults. Could you please let me know if it is possible. hBase is a column family NoSQL database. She flinched, but turned to him slightly and nodded, bringing her focus back on the empty hallway and forcing herself to calm down. we can find it by the way of spark-sql. In streaming mode, Griffin leverages HDFS as the temporary storage solution, which could be replaced by some other storage. create a hive meta store database named problem6 and import all tables from mysql retail_db database into hive meta store. Hive External Tables-We can also create an external table. Issue is when i try to use SparkSQL shell i am not able to query this Hive external table which was created on top of MaprDB. Partition is a very useful feature of Hive. Let's take the same previous Hive partition table. 3 of those I wouldn't use to analyze data. More information about this addition can be found here. A new software component called SAP HANA Spark Controller is used to integrate HANA and HDP together allowing HANA the ability to access and process data stored in the HDP Hadoop cluster. You can also configure Hive to use Spark as execution engine instead of MapReduce. I have written a java code for accessing Hive tables. Can anyone guide. 为了让spark hive的功能可以在 yarn cluster模式下运行。 花了点时间仔细看了一下他的配置. Hive Compatibility − Run unmodified Hive queries on existing warehouses. We can use Hive tables in any Spark-based application. Regarding max. Is there anyone make the query join different data sources work? especially Join hive table with other data sources. can any one please tell me how to create permanent tables in spark-sql which will be available for all session. Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. functions import * from pyspark. Databricks registers global tables either to the Databricks Hive metastore or to an external Hive metastore. The following; from pyspark. Normally we either emit it to some message queue or persist it to the data store like Hbase, Hive or S3; The model I've just described is not a mandatory standard of Spark Streaming applications but just an advice about how you can divide the logical content of the program module. Message view « Date » · « Thread » Top « Date » · « Thread » From: Suniti Singh Subject: Re: spark 1. i already use HWC to test 。Is there any other way. parse at org. If you are not using an external metastore, ensure that you do not have any custom Hive metastore configurations on your cluster. can any one please tell me how to create permanent tables in spark-sql which will be available for all session. Spark SQL queries not running with the correct user id, Hive DoAs. AnalysisException: u"Hive support is required to CREATE Hive TABLE (AS SELECT);;\n'CreateTable `testdb`. For additional documentation on using dplyr with Spark see the dplyr section of the sparklyr website. Instead, we went a step further: Remove the two temporary tables and combine all three Hive stages into a single Spark job that reads 60 TB of compressed data and performs a 90 TB shuffle and sort. hive> show databases;. xml,core-site. This website uses cookies for analytics, personalisation and advertising. This blog focuses on providing in-depth information of Avro in Hive. Functions in hive can be categorized into the following types. Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark. com Step 5: Run the Hive metastore process so that when Spark SQL runs, it can connect to metastore uris and take from it the hive-site. Hive has non-deterministic behavior when reading and writing at the same time or writing and writing at the same time. AnalysisException in Cloudera VM. * An instance of the Spark SQL execution engine that integrates with data stored in Hive. That sounds like a problem. The spark_connection object implements a DBI interface for Spark, so you can use dbGetQuery to execute SQL and return the result as an R data. Usage: Data is used outside of Hive. [this proves your ability to use meta store as a source]. Since Spark is capable of fully supporting HDFS Partitions via Hive, this now means that the HDFS limitation has been surpassed - we can now access an HDFS. It's also possible to execute SQL queries directly against tables within a Spark cluster. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. The data is stored in the form of tables (just. Hive does not support concurrently writing to tables. A pioneer in Corporate training and consultancy, Geoinsyssoft has trained / leveraged over 10,000 students, cluster of Corporate and IT Professionals with the best-in-class training processes, Geoinsyssoft enables customers to reduce costs, sharpen their business focus and obtain quantifiable results. Performance Comparison of Hive, Impala and Spark SQL of data for the Hive tables in both time and memory cost between Hadoop and Spark. xml file into the conf directory in Spark so it should see the Hive metastore. Also be careful when creating a table in Hive that the data does not contain a header row with the column names. If we drop the partition, will we able to access the data? Answer: If a hive partition created as a managed table, then after dropping the partition, data will also get removed from the path. xml to the conf/ directory. test; 60000 millis timeout while waiting for channel to be ready for read. spark sql hive example (3) I've looked into the source code, and found that despite HiveSessionState (in spark-hive), another class HiveConf is also needed to initiate SparkSession. I have created a Hadoop cluster and loaded some tables to hive. Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. com Step 5: Run the Hive metastore process so that when Spark SQL runs, it can connect to metastore uris and take from it the hive-site. functions import * from pyspark. For example, to match “abc”, a regular expression for regexp can be “^abc$”. You can also save the dataframe as table in hive metastore using. A community forum to discuss working with Databricks Cloud and Spark. $ vi download_objects_data. How to read this table into an RDD, and more importantly how to filter, map etc this nested collection in Spark? Could not find any references to this. On all of the worker nodes, the following must be installed on the classpath:. AnalysisException: u'Table not found: XXX' when run on yarn cluster. Making this Spark-Hive setup work might be a confusing because not all versions of Spark are compatible with all versions of Hive, and the hive-site. Use intermediate Hive staging table? By default the Hive source data is getting materialized in a Hive staging table prior to extraction by OLH. beeline -e "create database if not exists newdb"; schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark # Now move the table to target db under the spark catalog. Note that Spark should have been built with Hive support and more details can be found in the SQL programming guide. For example, to match “abc”, a regular expression for regexp can be “^abc$”. The table below describes the properties that influences RapidMiner Radoop's operation. Spark SQL connects hive using Hive Context and does not support any transactions. COW is NULL COW is NOT NULL rm puma* & pwgtp* file ss13pme. 0 DESCRIBE did not accept backticks (`) surrounding table identifiers, so DESCRIBE could not be used for tables with names that matched reserved keywords (HIVE-2949 and HIVE-6187). Find out the results, and discover which option might be best for your enterprise. nodemanager. Getting some CSV data to populate into Hive. Spark SQL supports automatically converting an RDD of JavaBeans into a DataFrame. Let us first understand the. So, I have fixed it. Fixed an issue in Hive where CREATE TABLE AS SELECT makes excessive list calls to Amazon S3. If that version is not included in your distribution, you can download pre-built Spark binaries for the relevant Hadoop version. Also be careful when creating a table in Hive that the data does not contain a header row with the column names. As a result, it can only process structured data read and written using SQL queries. Hi, all When I try hiveContext. could you give a bit more context: Is the table in the default database? You might have to add the schema to the table-name if it is in another database. Note: LLAP is much more faster than any other execution engines. Using a Snappy session, you can read an existing hive tables that are defined in an external hive catalog, use hive tables as external tables from SnappySession for queries, including joins with tables defined in SnappyData catalog, and also define new Hive table or view to be stored in external hive catalog. jars to point to the downloaded JARs using the following procedure:. In order to schedule the process tu run, we use spark-submit, eg:. For example, assume that a data set has been created from a specific Hive table. In other words, they do big data analytics. [KYLIN-436] - Can't sync Hive table metadata from other database rather than "default" [KYLIN-508] - Too high cardinality is not suitable for dictionary! [KYLIN-509] - Order by on fact table not works correctly [KYLIN-517] - Always delete the last one of Add Lookup page buttom even if deleting the first join condition. Initially Facebook was using traditional RDBMS gradually size of data being generated increased, RDBMS could not able to handle huge amount of data, so to overcome this problem, Facebook initially using MapReduce but programming is very difficult, later it found a solution called Apache Hive. Such as – It restricts queries of partitioned tables without a WHERE clause. i could see the tables but when i do some changes again in hive like dropping table then Atlas is not capturing. notice both the create statement but spark-shell still says '/user/mapr/tmp/hive' not found. Function tHiveInput is the dedicated component to the Hive database (the Hive data warehouse system). Can anyone help me out of this?. Submitting pyspark jobs on Yarn and accessing hive tables from spark. The table below describes the properties that influences RapidMiner Radoop's operation. 0 connect to hive metastore: Date: Wed, 09 Mar 2016 18:48:02 GMT. I can see tables now!!!! When I go to Visual query though and execute select * statement it is failing. 563 seconds spark-sql>. They are primarily a path on HDFS and may have an associated Hive table. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. One is Managed table managed by hive warehouse whenever you create a table data will be copied to internal warehouse. For managed tables, renaming a table moves the table location; for unmanaged (external) tables, renaming a table does not move the table location. Env: Below tests are done on Spark 1. Table of contents {:toc} Spark SQL also supports reading and writing data stored in Apache Hive. Regarding max. sql('select * from experian_int_openings_latest_orc'). 0 and Presto 0. I tried the following it worked well on spark-shell:. 1 ) and facing the issue as subject mentioned. Miscellaneous. Which is the best approach out of 2. If you want to run Hive recipes using "Hive CLI" mode, you also need a properly configured "hive" command line client for the DSS user account. AnalysisException: Table or view not found: `traintext`. Query a HBASE table through Hive using PySpark on EMR. Bucket the data with Hive. It's also possible to execute SQL queries directly against tables within a Spark cluster. Hive supports different data types that are not found in other database systems. Not being able to find a suitable tutorial, I decided to write one. Hive data source, HDFS is the default storage solution of Hive tables, but Griffin doesn't directly access HDFS yet. jars to point to the downloaded JARs using the following procedure:. HiveContext & you can perform query on Hive. xml and didn't find code. XML Word Printable JSON. 知识点2:Spark访问Hive. When not configured. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor. What if you would like to include this data in a Spark ML (machine. Databricks registers global tables either to the Databricks Hive metastore or to an external Hive metastore. The following; from pyspark. 2 Solution: Per Spark SQL programming guide, HiveContext is a super set of the SQLContext. Normally we either emit it to some message queue or persist it to the data store like Hbase, Hive or S3; The model I’ve just described is not a mandatory standard of Spark Streaming applications but just an advice about how you can divide the logical content of the program module. Best, Björn. That means, some very common syntax are not supported by sqlcontext. However, if those dependencies can be found on the classpath then Spark can load them automatically. I am using Spark 1. Once defined explicitly (using format method) or implicitly ( spark. Spark SQL and Impala tend to be faster than. Assume you have a set of tables (one table is Customers) in Hive. Do not worry about using a different engine for historical data. This tutorial seems much more clear and relevant to what I'm doing. Hive datasets; Multiple Hadoop clusters; Dynamic AWS EMR clusters; Hadoop multi-user security; Distribution-specific notes. Using CSV files to populate Hive tables. When possible, Spark transparently converts such Hive tables to DataSource format in order to take advantage of IO performance improvements in Spark SQL. You cannot dynamically switch between Glue Catalog and a Hive metastore. Partition is a very useful feature of Hive. If you are not using an external metastore, ensure that you do not have any custom Hive metastore configurations on your cluster.