deephaven.parquet

This module supports reading an external Parquet files into Deephaven tables and writing Deephaven tables out as Parquet files.

class ColumnInstruction(column_name=None, parquet_column_name=None, codec_name=None, codec_args=None, use_dictionary=False)[source]

Bases: object

This class specifies the instructions for reading/writing a Parquet column.

batch_write(tables, paths, col_definitions, col_instructions=None, compression_codec_name=None, max_dictionary_keys=None, grouping_cols=None)[source]

Writes tables to disk in parquet format to a supplied set of paths.

If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with .groupBy(<grouping columns>).ungroup() or .sort(<grouping column>).

Note that either all the tables are written out successfully or none is.

Parameters
  • tables (List[Table]) – the source tables

  • paths (List[str]) – the destinations paths. Any non existing directories in the paths provided are created. If there is an error, any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use

  • col_definitions (List[Column]) – the column definitions to use

  • col_instructions (List[ColumnInstruction]) – instructions for customizations while writing

  • compression_codec_name (str) – the compression codec to use, if not specified, defaults to SNAPPY

  • max_dictionary_keys (int) – the maximum dictionary keys allowed, if not specified, defaults to 2^20 (1,048,576)

  • grouping_cols (List[str]) – the group column names

Raises

DHError

delete(path)[source]

Deletes a Parquet table on disk.

Parameters

path (str) – path to delete

Raises

DHError

Return type

None

read(path, col_instructions=None, is_legacy_parquet=False)[source]

Reads in a table from a single parquet, metadata file, or directory with recognized layout.

Parameters
  • path (str) – the file or directory to examine

  • col_instructions (List[ColumnInstruction]) – instructions for customizations while reading

  • is_legacy_parquet (bool) – if the parquet data is legacy

Return type

Table

Returns

a table

Raises

DHError

write(table, path, col_definitions=None, col_instructions=None, compression_codec_name=None, max_dictionary_keys=None)[source]

Write a table to a Parquet file.

Parameters
  • table (Table) – the source table

  • path (str) – the destination file path; the file name should end in a “.parquet” extension. If the path includes non-existing directories they are created. If there is an error, any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use

  • col_definitions (List[Column]) – the column definitions to use, default is None

  • col_instructions (List[ColumnInstruction]) – instructions for customizations while writing, default is None

  • compression_codec_name (str) – the default compression codec to use, if not specified, defaults to SNAPPY

  • max_dictionary_keys (int) – the maximum dictionary keys allowed, if not specified, defaults to 2^20 (1,048,576)

Raises

DHError

Return type

None