deephaven.parquet¶
This module supports reading an external Parquet files into Deephaven tables and writing Deephaven tables out as Parquet files.
- class ColumnInstruction(column_name=None, parquet_column_name=None, codec_name=None, codec_args=None, use_dictionary=False)[source]¶
Bases:
object
This class specifies the instructions for reading/writing a Parquet column.
- batch_write(tables, paths, col_definitions, col_instructions=None, compression_codec_name=None, max_dictionary_keys=None, grouping_cols=None)[source]¶
Writes tables to disk in parquet format to a supplied set of paths.
If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with .groupBy(<grouping columns>).ungroup() or .sort(<grouping column>).
Note that either all the tables are written out successfully or none is.
- Parameters
tables (List[Table]) – the source tables
paths (List[str]) – the destinations paths. Any non existing directories in the paths provided are created. If there is an error, any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
col_definitions (List[Column]) – the column definitions to use
col_instructions (List[ColumnInstruction]) – instructions for customizations while writing
compression_codec_name (str) – the compression codec to use, if not specified, defaults to SNAPPY
max_dictionary_keys (int) – the maximum dictionary keys allowed, if not specified, defaults to 2^20 (1,048,576)
grouping_cols (List[str]) – the group column names
- Raises
DHError –
- delete(path)[source]¶
Deletes a Parquet table on disk.
- Parameters
path (str) – path to delete
- Raises
DHError –
- Return type
None
- read(path, col_instructions=None, is_legacy_parquet=False)[source]¶
Reads in a table from a single parquet, metadata file, or directory with recognized layout.
- Parameters
path (str) – the file or directory to examine
col_instructions (List[ColumnInstruction]) – instructions for customizations while reading
is_legacy_parquet (bool) – if the parquet data is legacy
- Return type
- Returns
a table
- Raises
DHError –
- write(table, path, col_definitions=None, col_instructions=None, compression_codec_name=None, max_dictionary_keys=None)[source]¶
Write a table to a Parquet file.
- Parameters
table (Table) – the source table
path (str) – the destination file path; the file name should end in a “.parquet” extension. If the path includes non-existing directories they are created. If there is an error, any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
col_definitions (List[Column]) – the column definitions to use, default is None
col_instructions (List[ColumnInstruction]) – instructions for customizations while writing, default is None
compression_codec_name (str) – the default compression codec to use, if not specified, defaults to SNAPPY
max_dictionary_keys (int) – the maximum dictionary keys allowed, if not specified, defaults to 2^20 (1,048,576)
- Raises
DHError –
- Return type
None