dimcat.data.packages package#
Submodules#
dimcat.data.packages.base module#
- class dimcat.data.packages.base.Package(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#
Bases:
DataWrapper for a
frictionless.Package. The purpose of a Package is to create, load, and store a collection ofResourceobjects. The default way of storing aDimcatResourcepackage is a[name.]datapackage.jsondescriptor and a .zip file containing one .tsv file per DimcatResource contained in the package.- \* ``package`` (:obj:`frictionless.Package`) - The frictionless Package object that is wrapped
by this class.
- \* ``package_name`` (:obj:`str`) - The name of the package that can be used to access it.
- \* ``basepath`` (:obj:`str`) - The basepath where the package and its .json descriptor are stored.
- class PickleSchema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
PackageSchema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
PackageSchema,Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- add_resource(resource: Resource, update_descriptor: bool = False)[source]#
Adds a resource to the package.
- property available_features: Set[FeatureName]#
The set of all available features defined as the union of
contained_featuresandextractable_features.
- check_if_homogeneous(resource_types: Optional[Type[Resource], Tuple[Type[Resource], ...]] = None, status_exactly=None, status_at_least=None, status_at_most=None) bool[source]#
Returns True if all resources in the package conform to the specified criteria.
- Parameters:
resource_types – If not specified, all resources need to be of the same type.
status_exactly – If specified, all resources need to have exactly this status.
status_at_least – If specified, all resources need to have at least this status.
status_at_most – If specified, all resources need to have at most this status.
Returns:
- property contained_features: Set[FeatureName]#
The dtypes of all feature resources included in the package.
- create_and_add_resource(resource: Optional[Union[Resource, Resource, str]] = None, resource_name: Optional[str] = None, basepath: Optional[str] = None, auto_validate: bool = False) None[source]#
Adds a resource to the package. Parameters are passed to
DimcatResource.
- property descriptor_filename: str#
The path to the descriptor file on disk, relative to the basepath.
- property descriptor_is_complete: bool#
Returns True when the package has a descriptor on disk that contains all resources.
- extract_feature(feature: Union[Feature, Type[Feature], DimcatConfig, MutableMapping, FeatureName, str]) F[source]#
- property extractable_features: Set[FeatureName]#
The dtypes of all features that can be extracted from the facet resources included in the package.
- property filepath: str#
The filename of the package’s ZIP file on disk, corresponding to
<package_name>.zip
- classmethod from_descriptor(descriptor: dict | frictionless.package.package.Package, descriptor_filename: Optional[str] = None, auto_validate: Optional[bool] = None, basepath: Optional[str] = None) Self[source]#
Create a new Package from a frictionless descriptor dictionary.
- Parameters:
descriptor – Dictionary corresponding to a frictionless descriptor.
basepath – The basepath for all resources in the package.
auto_validate – Whether to automatically validate the package.
- Returns:
The new Package.
- classmethod from_descriptor_path(descriptor_path: str, basepath: Optional[str] = None, auto_validate: bool = False) Self[source]#
Create a new Package from a descriptor path.
- Parameters:
descriptor_path – The path to the descriptor file.
basepath – The basepath for all resources in the package.
auto_validate – Whether to automatically validate the package.
- Returns:
The new Package.
- classmethod from_directory(directory: str, package_name: Optional[str] = None, extensions: Optional[Iterable[str]] = None, file_re: Optional[str] = None, exclude_re: Optional[str] = None, resource_names: Optional[Callable[[str], Optional[str]]] = None, corpus_names: Optional[Callable[[str], Optional[str]]] = None, auto_validate: bool = False) Self[source]#
Create a new Package from an iterable of filepaths.
- Parameters:
directory – The directory that is to be scanned for files with particular extensions.
package_name – The name of the new package. If None, the base of the directory is used.
extensions – The extensions of the files to be discovered under
directoryand which are to be turned intoResourceobjects viafrom_filepaths().resource_names – Name factory for the resources created from the paths. Names also serve as piece identifiers. By default, the filename is used. To override this behaviour you can pass a callable that takes a filepath and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().file_re – Pass a regular expression in order to select only files that (partially) match it.
corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
- classmethod from_filepaths(filepaths: Iterable[str], package_name: str, resource_names: Optional[Union[Iterable[str], Callable[[str], Optional[str]]]] = None, corpus_names: Optional[Union[Iterable[str], Callable[[str], Optional[str]], str]] = None, auto_validate: bool = False, basepath: Optional[str] = None) Self[source]#
Create a new Package from an iterable of filepaths.
- Parameters:
filepaths – The filepaths that are to be turned into
Resourceobjects and packaged.package_name – The name of the new package. If None, the name of the original package is used.
resource_names – Names of (or name factory for) the created resources serving as piece identifiers. By default, the filename is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_package(package: Package, package_name: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: Optional[bool] = None, basepath: Optional[str] = None) Self[source]#
Create a new Package from an existing Package by copying all resources.
- Parameters:
package – The Package to copy.
package_name – The name of the new package. If None, the name of the original package is used.
descriptor_filename – Pass a JSON or YAML filename or relative filepath to override the default (
<package_name>.json). Following frictionless specs it should end on “.datapackage.[json|yaml]”.auto_validate – Set a value to override the value set in
package.basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_resources(resources: Iterable[Resource], package_name: str, descriptor_filename: Optional[str] = None, auto_validate: bool = False, basepath: Optional[str] = None) Self[source]#
Create a new Package from an iterable of
Resource.- Parameters:
resources – The Resources to package.
package_name – The name of the new package.
descriptor_filename – Pass a JSON or YAML filename or relative filepath to override the default (
<package_name>.json). Following frictionless specs it should end on “.datapackage.[json|yaml]”.auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- get_descriptor_filename(set_default_if_missing: bool = False) str[source]#
Like
descriptor_filenamebut returning a default value if None. Ifset_default_if_missingis set to True and no basepath has been set (e.g. during initialization), thebasepathis permanently set to the default basepath.
- get_descriptor_path(set_default_if_missing=False) Optional[str][source]#
Returns the path to the descriptor file. If basepath or descriptor_filename are not set, they are set permanently to their defaults. If
create_if_missingis set to True, the descriptor file is created if it does not exist yet.
- get_feature(feature: Union[Feature, Type[Feature], DimcatConfig, MutableMapping, FeatureName, str]) F[source]#
Checks if the package includes a feature matching the specs, and extracts it otherwise, if possible.
- Raises:
NoMatchingResourceFoundError – If none of the previously extracted features matches the specs and none of the input resources allows for extracting a matching feature.
- get_piece_index() PieceIndex[source]#
Returns the piece index corresponding to all resources’ IDs, sorted.
- get_resource(resource: Union[DimcatConfig, Type[Resource], str])[source]#
High-level method that calls one of the other get_resource_* methods depending on the type of the argument. A string is interpreted as resource name, not as type.
- get_resource_by_config(config: DimcatConfig) R[source]#
Returns the first resource that matches the given config.
- Raises:
EmptyPackageError – If the package is empty.
NoMatchingResourceFoundError – If no resource matches the config.
- get_resource_by_name(name: Optional[str] = None) R[source]#
Returns the Resource with the given name. If no name is given, returns the last resource.
- Raises:
EmptyPackageError – If the package is empty.
ResourceNotFoundError – If the resource with the given name is not found.
- get_resources_by_regex(regex: str) List[Resource][source]#
Returns the Resource objects whose names contain the given regex.
- get_resources_by_type(resource_type: Union[Type[Resource], str], include_subclasses: bool = False) List[Resource][source]#
Returns the Resource objects of the given type.
- get_zip_filepath() str[source]#
Returns the path of the ZIP file that the resources of this package are serialized to.
- get_zip_path() str[source]#
Returns the path of the ZIP file that the resources of this package are serialized to.
- property is_aligned: bool#
Returns True when the basepaths, filepaths, and descriptor_filenames of all resources are aligned with the package.
- property is_partially_serialized: bool#
Returns True when both the resource and descriptor exist on disk but raises if only on of them exists.
- property normpath: str#
Absolute path to the serialized or future tabular file. Raises if basepath is not set.
- replace_resource(resource: Resource, name_of_replaced_resource: Optional[str] = None) None[source]#
Replaces the package with the same name as the given package with the given package.
- property resources: List[Resource]#
Returns a list of the resources in the package. Mutating the list will not affect the package but mutating one of the resources would.
- property status: PackageStatus#
- class dimcat.data.packages.base.PackageMode(value)[source]#
Bases:
FriendlyEnumThe behaviour of a Package when adding a resource with incompatible paths.
- ALLOW_MISALIGNMENT = 'ALLOW_MISALIGNMENT'#
Reconcile the resource and add a physical copy to the package ZIP.
- RAISE = 'RAISE'#
Raises an error when adding a resource with an incompatible path.
- RECONCILE_EVERYTHING = 'RECONCILE_EVERYTHING'#
Copies newly added resources to the package’s basepath if necessary, overwriting existing files.
- RECONCILE_SAFELY = 'RECONCILE_SAFELY'#
Copies newly added resources to the package’s basepath if necessary but without overwriting existing files.
- class dimcat.data.packages.base.PackageSchema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- class dimcat.data.packages.base.PackageStatus(value)[source]#
Bases:
IntEnumExpresses the status of a :clas:`Package` with respect to the paths of the included resources being aligned with the package’s basepath and serialized to the package’s ZIP file or not. The enum members have increasing integer values starting with EMPTY == 0.
PackageStatus
is_aligned
package_exists & descriptor_exists
R.is_packaged
Resource types
EMPTY
True
?
True
any
PATHS_ONLY
?
?
?
PathResource
MISALIGNED
False
?
False
any
ALIGNED
True
False
True
any
PARTIALLY_SERIALIZED
True
True
True
any
FULLY_SERIALIZED
True
True
True
any
- ALIGNED = 3#
- EMPTY = 0#
- FULLY_SERIALIZED = 5#
- MISALIGNED = 2#
- PARTIALLY_SERIALIZED = 4#
- PATHS_ONLY = 1#
- class dimcat.data.packages.base.PathPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#
Bases:
PackageBehaves like
Packagebut with the important difference that it never interprets filepaths as frictionless resource descriptors (which Package loads as the appropriateResourcetype).
dimcat.data.packages.dc module#
- class dimcat.data.packages.dc.DimcatPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#
Bases:
Package- create_and_add_resource(df: Optional[D] = None, resource: Optional[Union[Resource, Resource, str]] = None, resource_name: Optional[str] = None, basepath: Optional[str] = None, auto_validate: bool = False) None[source]#
Adds a resource to the package. Parameters are passed to
DimcatResource.
- get_boolean_resource_table() DataFrame[source]#
Returns a table with this package’s piece index and one boolean column per resource, indicating whether the resource is available for a given piece or not.
- get_piece_index() PieceIndex[source]#
Returns the piece index corresponding to a sorted union of all included resources’ indices.
dimcat.data.packages.score module#
- class dimcat.data.packages.score.MuseScorePackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#
Bases:
DimcatPackageA datapackage as created by the ms3 MuseScore parsing library. Contains TSV facets with the naming format
<name>.<facet>[.tsv].
- class dimcat.data.packages.score.ScorePathPackage(package_name: str, resources: Optional[Iterable[Resource]] = None, basepath: Optional[str] = None, descriptor_filename: Optional[str] = None, auto_validate: bool = False, metadata: Optional[dict] = None)[source]#
Bases:
PathPackageA package containing resources that are (references to) scores.