![]() A DatasetDescriptor is immutable, and is created by DatasetDescriptor.Builder. DatasetDescriptor OptionsĪ DatasetDescriptor encapsulates the configuration needed to read and write a dataset. Once a dataset is created, its schema is loaded automatically. Once you have defined a schema, you can use DatasetDescriptor.Builder to create a descriptor instance and then a dataset using that descriptor. This example creates a dataset named products in the Hive metastore. With a storage URI and a DatasetDescriptor, you can use Datasets.create to create a dataset instance. It provides operations around datasets, such as creating or deleting a dataset. The Datasets class is the starting point when working with the Kite Data API. See DatasetDescriptor Options for more configuration options. You create a DatasetDescriptor object using the fluent DatasetDescriptor.Builder to set the schema and other configuration. That descriptor is saved and used by Kite when you interact with the dataset.Īt a minimum, a DatasetDescriptor requires the record schema, which describes the records. When you create a Dataset, you supply a DatasetDescriptor. It encapsulates all of the configuration needed to read and write data. DatasetDescriptorsĪ DatasetDescriptor provides the structural definition of a dataset. dataset:hive:productsĬommon dataset URI patterns are Hive, HDFS, Local FileSystem, and HBase. The dataset URI determines how Kite stores your dataset and its configuration metadata.įor example, if you want to create the products dataset in Hive, you can use this URI. Dataset URIsĭatasets are identified by URI. The Dataset interface provides methods to work with the collection of records it represents. ![]() Records are similar to table rows, but the columns can contain strings, numbers, or nested data structures such as lists, maps, and other records. DatasetĪ dataset is a collection of records, like a relational table. With the Kite API, you can perform tasks such as reading a dataset, defining and reading views of a dataset, and using MapReduce to process a dataset. When you want to perform these tasks using a Java program, you can use the Kite API. Most of the time, you can create datasets and system prototypes using the Kite command line interface (CLI). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |