dimcat.data.packages package#

Submodules#

dimcat.data.packages.base module#

class dimcat.data.packages.base.Package(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#

Bases: Data

Wrapper for a frictionless.Package. The purpose of a Package is to create, load, and store a collection of Resource objects. The default way of storing a DimcatResource package is a [name.]datapackage.json descriptor and a .zip file containing one .tsv file per DimcatResource contained in the package.

\* ``package`` (:obj:`frictionless.Package`) - The frictionless Package object that is wrapped

by this class.

\* ``package_name`` (:obj:`str`) - The name of the package that can be used to access it.
\* ``basepath`` (:obj:`str`) - The basepath where the package and its .json descriptor are stored.
class PickleSchema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#

Bases: PackageSchema

dump_fields: dict[str, Field]#
exclude: set[Any] | MutableSet[Any]#
fields: dict[str, Field]#

Dictionary mapping field_names -> Field objects

load_fields: dict[str, Field]#
opts: Any = <marshmallow.schema.SchemaOpts object>#
unknown: types.UnknownOption#
class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#

Bases: PackageSchema, Schema

dump_fields: dict[str, Field]#
exclude: set[Any] | MutableSet[Any]#
fields: dict[str, Field]#

Dictionary mapping field_names -> Field objects

load_fields: dict[str, Field]#
opts: Any = <marshmallow.schema.SchemaOpts object>#
unknown: types.UnknownOption#
add_resource(resource: Resource, update_descriptor: bool = False)[source]#

Adds a resource to the package.

property available_features: Set[FeatureName]#

The set of all available features defined as the union of contained_features and extractable_features.

property basepath: str#
check_if_homogeneous(resource_types: Optional[Type[Resource], Tuple[Type[Resource], ...]] = None, status_exactly=None, status_at_least=None, status_at_most=None) bool[source]#

Returns True if all resources in the package conform to the specified criteria.

Parameters:
  • resource_types – If not specified, all resources need to be of the same type.

  • status_exactly – If specified, all resources need to have exactly this status.

  • status_at_least – If specified, all resources need to have at least this status.

  • status_at_most – If specified, all resources need to have at most this status.

Returns:

property contained_features: Set[FeatureName]#

The dtypes of all feature resources included in the package.

copy() Self[source]#

Returns a copy of the package.

create_and_add_resource(resource: Optional[Union[Resource, Resource, str]] = None, resource_name: Optional[str] = None, basepath: Optional[str] = None, auto_validate: bool = False) None[source]#

Adds a resource to the package. Parameters are passed to DimcatResource.

property descriptor_exists: bool#
property descriptor_filename: str#

The path to the descriptor file on disk, relative to the basepath.

property descriptor_is_complete: bool#

Returns True when the package has a descriptor on disk that contains all resources.

extend(resources: Iterable[Resource]) None[source]#

Adds multiple resources to the package.

extract_feature(feature: Union[Feature, Type[Feature], DimcatConfig, MutableMapping, FeatureName, str]) F[source]#
property extractable_features: Set[FeatureName]#

The dtypes of all features that can be extracted from the facet resources included in the package.

property filepath: str#

The filename of the package’s ZIP file on disk, corresponding to <package_name>.zip

classmethod from_descriptor(descriptor: dict | frictionless.package.package.Package, descriptor_filename: Optional[str] = None, auto_validate: Optional[bool] = None, basepath: Optional[str] = None) Self[source]#

Create a new Package from a frictionless descriptor dictionary.

Parameters:
  • descriptor – Dictionary corresponding to a frictionless descriptor.

  • basepath – The basepath for all resources in the package.

  • auto_validate – Whether to automatically validate the package.

Returns:

The new Package.

classmethod from_descriptor_path(descriptor_path: str, basepath: Optional[str] = None, auto_validate: bool = False) Self[source]#

Create a new Package from a descriptor path.

Parameters:
  • descriptor_path – The path to the descriptor file.

  • basepath – The basepath for all resources in the package.

  • auto_validate – Whether to automatically validate the package.

Returns:

The new Package.

classmethod from_directory(directory: str, package_name: Optional[str] = None, extensions: Optional[Iterable[str]] = None, file_re: Optional[str] = None, exclude_re: Optional[str] = None, resource_names: Optional[Callable[[str], Optional[str]]] = None, corpus_names: Optional[Callable[[str], Optional[str]]] = None, auto_validate: bool = False) Self[source]#

Create a new Package from an iterable of filepaths.

Parameters:
  • directory – The directory that is to be scanned for files with particular extensions.

  • package_name – The name of the new package. If None, the base of the directory is used.

  • extensions – The extensions of the files to be discovered under directory and which are to be turned into Resource objects via from_filepaths().

  • resource_names – Name factory for the resources created from the paths. Names also serve as piece identifiers. By default, the filename is used. To override this behaviour you can pass a callable that takes a filepath and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via make_valid_frictionless_name().

  • file_re – Pass a regular expression in order to select only files that (partially) match it.

  • corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via make_valid_frictionless_name().

  • auto_validate – Set True to validate the new package after copying it.

classmethod from_filepaths(filepaths: Iterable[str], package_name: str, resource_names: Optional[Union[Iterable[str], Callable[[str], Optional[str]]]] = None, corpus_names: Optional[Union[Iterable[str], Callable[[str], Optional[str]], str]] = None, auto_validate: bool = False, basepath: Optional[str] = None) Self[source]#

Create a new Package from an iterable of filepaths.

Parameters:
  • filepaths – The filepaths that are to be turned into Resource objects and packaged.

  • package_name – The name of the new package. If None, the name of the original package is used.

  • resource_names – Names of (or name factory for) the created resources serving as piece identifiers. By default, the filename is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via make_valid_frictionless_name().

  • corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via make_valid_frictionless_name().

  • auto_validate – Set True to validate the new package after copying it.

  • basepath – The basepath where the new package will be stored. If None, the basepath of the original package

classmethod from_package(package: Package, package_name: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: Optional[bool] = None, basepath: Optional[str] = None) Self[source]#

Create a new Package from an existing Package by copying all resources.

Parameters:
  • package – The Package to copy.

  • package_name – The name of the new package. If None, the name of the original package is used.

  • descriptor_filename – Pass a JSON or YAML filename or relative filepath to override the default (<package_name>.json). Following frictionless specs it should end on “.datapackage.[json|yaml]”.

  • auto_validate – Set a value to override the value set in package.

  • basepath – The basepath where the new package will be stored. If None, the basepath of the original package

classmethod from_resources(resources: Iterable[Resource], package_name: str, descriptor_filename: Optional[str] = None, auto_validate: bool = False, basepath: Optional[str] = None) Self[source]#

Create a new Package from an iterable of Resource.

Parameters:
  • resources – The Resources to package.

  • package_name – The name of the new package.

  • descriptor_filename – Pass a JSON or YAML filename or relative filepath to override the default (<package_name>.json). Following frictionless specs it should end on “.datapackage.[json|yaml]”.

  • auto_validate – Set True to validate the new package after copying it.

  • basepath – The basepath where the new package will be stored. If None, the basepath of the original package

get_descriptor_filename(set_default_if_missing: bool = False) str[source]#

Like descriptor_filename but returning a default value if None. If set_default_if_missing is set to True and no basepath has been set (e.g. during initialization), the basepath is permanently set to the default basepath.

get_descriptor_path(set_default_if_missing=False) Optional[str][source]#

Returns the path to the descriptor file. If basepath or descriptor_filename are not set, they are set permanently to their defaults. If create_if_missing is set to True, the descriptor file is created if it does not exist yet.

get_feature(feature: Union[Feature, Type[Feature], DimcatConfig, MutableMapping, FeatureName, str]) F[source]#

Checks if the package includes a feature matching the specs, and extracts it otherwise, if possible.

Raises:

NoMatchingResourceFoundError – If none of the previously extracted features matches the specs and none of the input resources allows for extracting a matching feature.

get_metadata() Metadata[source]#

Returns the metadata of the package.

get_piece_index() PieceIndex[source]#

Returns the piece index corresponding to all resources’ IDs, sorted.

get_resource(resource: Union[DimcatConfig, Type[Resource], str])[source]#

High-level method that calls one of the other get_resource_* methods depending on the type of the argument. A string is interpreted as resource name, not as type.

get_resource_by_config(config: DimcatConfig) R[source]#

Returns the first resource that matches the given config.

Raises:
get_resource_by_name(name: Optional[str] = None) R[source]#

Returns the Resource with the given name. If no name is given, returns the last resource.

Raises:
get_resources_by_regex(regex: str) List[Resource][source]#

Returns the Resource objects whose names contain the given regex.

get_resources_by_type(resource_type: Union[Type[Resource], str], include_subclasses: bool = False) List[Resource][source]#

Returns the Resource objects of the given type.

get_zip_filepath() str[source]#

Returns the path of the ZIP file that the resources of this package are serialized to.

get_zip_path() str[source]#

Returns the path of the ZIP file that the resources of this package are serialized to.

property is_aligned: bool#

Returns True when the basepaths, filepaths, and descriptor_filenames of all resources are aligned with the package.

property is_empty: bool#

Returns True when the package contains no resources.

property is_fully_serialized: bool#

Returns True when the package has been fully serialized.

property is_partially_serialized: bool#

Returns True when both the resource and descriptor exist on disk but raises if only on of them exists.

property is_paths_only: bool#

Returns True when the package has a basepath but no resources.

iter_facets() Iterator[Facet][source]#

Iterates over all facets in the package.

iter_features() Iterator[Feature][source]#

Iterates over all features in the package.

make_descriptor() dict[source]#
property n_resources: int#
property normpath: str#

Absolute path to the serialized or future tabular file. Raises if basepath is not set.

property package_exists: bool#

Returns True if the package’s normpath exists on disk.

property package_name: str#
replace_resource(resource: Resource, name_of_replaced_resource: Optional[str] = None) None[source]#

Replaces the package with the same name as the given package with the given package.

property resource_names: List[str]#
property resources: List[Resource]#

Returns a list of the resources in the package. Mutating the list will not affect the package but mutating one of the resources would.

property status: PackageStatus#
store_descriptor(descriptor_path: Optional[str] = None, overwrite=True, allow_partial=False) str[source]#

Stores the descriptor to disk based on the package’s configuration and returns its path.

summary_dict(verbose: bool = False) str[source]#

Returns a summary of the package.

validate(raise_exception: bool = False) Report[source]#
property zip_file_exists: bool#
class dimcat.data.packages.base.PackageMode(value)[source]#

Bases: FriendlyEnum

The behaviour of a Package when adding a resource with incompatible paths.

ALLOW_MISALIGNMENT = 'ALLOW_MISALIGNMENT'#

Reconcile the resource and add a physical copy to the package ZIP.

RAISE = 'RAISE'#

Raises an error when adding a resource with an incompatible path.

RECONCILE_EVERYTHING = 'RECONCILE_EVERYTHING'#

Copies newly added resources to the package’s basepath if necessary, overwriting existing files.

RECONCILE_SAFELY = 'RECONCILE_SAFELY'#

Copies newly added resources to the package’s basepath if necessary but without overwriting existing files.

class dimcat.data.packages.base.PackageSchema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#

Bases: Schema

catch_package_name_argument(data, **kwargs)[source]#
dump_fields: dict[str, Field]#
exclude: set[Any] | MutableSet[Any]#
fields: dict[str, Field]#

Dictionary mapping field_names -> Field objects

load_fields: dict[str, Field]#
opts: Any = <marshmallow.schema.SchemaOpts object>#
unknown: types.UnknownOption#
class dimcat.data.packages.base.PackageStatus(value)[source]#

Bases: IntEnum

Expresses the status of a :clas:`Package` with respect to the paths of the included resources being aligned with the package’s basepath and serialized to the package’s ZIP file or not. The enum members have increasing integer values starting with EMPTY == 0.

PackageStatus

is_aligned

package_exists & descriptor_exists

R.is_packaged

Resource types

EMPTY

True

?

True

any

PATHS_ONLY

?

?

?

PathResource

MISALIGNED

False

?

False

any

ALIGNED

True

False

True

any

PARTIALLY_SERIALIZED

True

True

True

any

FULLY_SERIALIZED

True

True

True

any

ALIGNED = 3#
EMPTY = 0#
FULLY_SERIALIZED = 5#
MISALIGNED = 2#
PARTIALLY_SERIALIZED = 4#
PATHS_ONLY = 1#
class dimcat.data.packages.base.PathPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#

Bases: Package

Behaves like Package but with the important difference that it never interprets filepaths as frictionless resource descriptors (which Package loads as the appropriate Resource type).

dimcat.data.packages.dc module#

class dimcat.data.packages.dc.DimcatPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#

Bases: Package

create_and_add_resource(df: Optional[D] = None, resource: Optional[Union[Resource, Resource, str]] = None, resource_name: Optional[str] = None, basepath: Optional[str] = None, auto_validate: bool = False) None[source]#

Adds a resource to the package. Parameters are passed to DimcatResource.

get_boolean_resource_table() DataFrame[source]#

Returns a table with this package’s piece index and one boolean column per resource, indicating whether the resource is available for a given piece or not.

get_piece_index() PieceIndex[source]#

Returns the piece index corresponding to a sorted union of all included resources’ indices.

dimcat.data.packages.score module#

class dimcat.data.packages.score.MuseScorePackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#

Bases: DimcatPackage

A datapackage as created by the ms3 MuseScore parsing library. Contains TSV facets with the naming format <name>.<facet>[.tsv].

class dimcat.data.packages.score.ScorePathPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#

Bases: PathPackage

A package containing resources that are (references to) scores.

Module contents#