dimcat.steps.loaders package#
Submodules#
dimcat.steps.loaders.base module#
A loader reads an existing datapackage or creates one by parsing data from a source.
- class dimcat.steps.loaders.base.FacetName(value)[source]#
Bases:
FriendlyEnumThe names of the facets that can be extracted from scores.
- annotations = 'annotations'#
- control = 'control'#
- events = 'events'#
- metadata = 'metadata'#
- structure = 'structure'#
- class dimcat.steps.loaders.base.LoadedFacets(events: Dict[tuple, pandas.core.frame.DataFrame] = <factory>, control: Dict[tuple, pandas.core.frame.DataFrame] = <factory>, structure: Dict[tuple, pandas.core.frame.DataFrame] = <factory>, annotations: Dict[tuple, pandas.core.frame.DataFrame] = <factory>, metadata: Dict[tuple, pandas.core.series.Series] = <factory>)[source]#
Bases:
object
- class dimcat.steps.loaders.base.Loader(basepath: Optional[str] = None, packages: Optional[DimcatCatalog] = None)[source]#
Bases:
PipelineStepBase class for all loaders.
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- add_package(package: DimcatPackage) None[source]#
Add a package to the loader that contains resources to be processed.
- check_resource(resource: Resource) None[source]#
Checks whether the resource at the given path exists.
- classmethod from_directory(directory: str, package_name: Optional[str] = None, extensions: Optional[Iterable[str]] = None, file_re: Optional[str] = None, exclude_re: Optional[str] = None, resource_names: Optional[Callable[[str], Optional[str]]] = None, corpus_names: Optional[Callable[[str], Optional[str]]] = None, auto_validate: bool = False, basepath: Optional[str] = None, **kwargs) Self[source]#
Create a loader from a
ScorePackagecreated on the fly from an iterable of filepaths.- Parameters:
directory – The directory that is to be scanned for files with particular extensions.
package_name – The name of the new package. If None, the base of the directory is used.
extensions – The extensions of the files to be discovered under
directoryand which are to be turned intoResourceobjects. Defaults to this loader’s_accepted_file_extensions.file_re – Pass a regular expression in order to select only files that (partially) match it.
resource_names – Name factory for the resources created from the paths. Names also serve as piece identifiers. By default, the filename is used. To override this behaviour you can pass a callable that takes a filepath and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_filepaths(filepaths: Iterable[str], basepath: Optional[str] = None) Self[source]#
Create a loader from a DimcatPackage created on the fly from an iterable of filepaths.
- Parameters:
filepaths – The filepaths that are to be turned into
Resourceobjects and packaged.basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_package(package: DimcatPackage, basepath: Optional[str] = None) Self[source]#
Create a loader from a DimcatPackage.
- get_basepath() str[source]#
Get the basepath of the resource. If not specified, the default basepath is returned.
- iter_package_descriptors() Iterator[str][source]#
Create datapackage(s) for the input catalog of a Dataset and iterate over their descriptor paths.
- class dimcat.steps.loaders.base.PackageLoader(basepath: Optional[str] = None, packages: Optional[DimcatCatalog] = None)[source]#
Bases:
LoaderSimple loader that discovers and loads frictionless datapackages through their descriptors.
- default_loader_name = 'package_loader'#
- classmethod from_directory(directory: str, package_name: Optional[str] = None, extensions: Optional[Iterable[str]] = None, file_re: Optional[str] = None, exclude_re: Optional[str] = None, resource_names: Optional[Callable[[str], Optional[str]]] = None, corpus_names: Optional[Callable[[str], Optional[str]]] = None, auto_validate: bool = False, basepath: Optional[str] = None, **kwargs) Self[source]#
Create a loader from a
ScorePackagecreated on the fly from an iterable of filepaths.- Parameters:
directory – The directory that is to be scanned for files with particular extensions.
package_name – The name of the new package. If None, the base of the directory is used.
extensions – The extensions of the files to be discovered under
directoryand which are to be turned intoResourceobjects. Defaults to this loader’s_accepted_file_extensions.file_re – Pass a regular expression in order to select only files that (partially) match it.
resource_names – Name factory for the resources created from the paths. Names also serve as piece identifiers. By default, the filename is used. To override this behaviour you can pass a callable that takes a filepath and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- class dimcat.steps.loaders.base.ScoreLoader(basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False)[source]#
Bases:
LoaderBase class for all loaders that parse scores and create a datapackage containing the extracted facets.
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- add_piece_facet_dataframe(facet_name: FacetName, ID: tuple, df: pandas.core.frame.DataFrame | pandas.core.series.Series) None[source]#
- check_resource(resource: PathResource) None[source]#
Checks whether the resource at the given path exists.
- classmethod from_directory(directory: str, package_name: Optional[str] = None, extensions: Optional[Iterable[str]] = None, file_re: Optional[str] = None, exclude_re: Optional[str] = None, resource_names: Optional[Callable[[str], Optional[str]]] = None, corpus_names: Optional[Callable[[str], Optional[str]]] = None, auto_validate: bool = False, basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False) Self[source]#
Create a loader from a
ScorePackagecreated on the fly from an iterable of filepaths.- Parameters:
directory – The directory that is to be scanned for files with particular extensions.
package_name – The name of the new package. If None, the base of the directory is used.
extensions – The extensions of the files to be discovered under
directoryand which are to be turned intoResourceobjects. Defaults to this loader’s_accepted_file_extensions.file_re – Pass a regular expression in order to select only files that (partially) match it.
resource_names – Name factory for the resources created from the paths. Names also serve as piece identifiers. By default, the filename is used. To override this behaviour you can pass a callable that takes a filepath and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_filepaths(filepaths: Iterable[str], package_name: str, resource_names: Optional[Union[Iterable[str], Callable[[str], str]]] = None, corpus_names: Optional[Union[Iterable[str], Callable[[str], Optional[str]]]] = None, auto_validate: bool = False, basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False) Self[source]#
Create a loader from a
ScorePackagecreated on the fly from an iterable of filepaths.- Parameters:
filepaths – The filepaths that are to be turned into
Resourceobjects and packaged.package_name – The name of the new package.
resource_names – Names of (or name factory for) the created resources serving as piece identifiers. By default, the filename is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the filename). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().corpus_names – Names of (or name factory for) the corpus that each resource (=piece) belongs to and that is used in the (‘corpus’, ‘piece’) ID. By default, the name of the package is used. To override this behaviour you can pass an iterable of names corresponding to paths, or a callable that takes a path and returns a name. When the callable returns None, the default is used (i.e., the package_name). Whatever the name turns out to be, it will always be turned into a valid frictionless name via
make_valid_frictionless_name().auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- classmethod from_package(package: ScorePathPackage, basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False) Self[source]#
Create a loader from a DimcatPackage.
- classmethod from_resources(resources: Union[Iterable[PathResource], PathResource], package_name: str, auto_validate: bool = False, basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False) Self[source]#
Create a loader from a
ScorePackagecreated on the fly from an iterable of PathResources.- Parameters:
resources – The
PathResourceobjects that will be turned into a package.package_name – The name of the new package.
auto_validate – Set True to validate the new package after copying it.
basepath – The basepath where the new package will be stored. If None, the basepath of the original package
- get_loader_name() str[source]#
Returns
loader_nameif set, otherwisedefault_loader_name.
- get_zip_filepath() str[source]#
Returns the filename of the ZIP file that the resources of this package are serialized to.
- get_zip_path() str[source]#
Returns the path of the ZIP file that the resources of this package are serialized to.
- iter_package_descriptors() Iterator[str][source]#
Create datapackage(s) and iterate over their descriptor paths.
- make_and_store_datapackage(overwrite: Optional[bool] = None) str[source]#
- Parameters:
overwrite – Set to a boolean to set
overwriteto a new value.
Returns:
- Raises:
FileExistsError – If the zip file <basepath>/<package_name>.zip already exists.
dimcat.steps.loaders.m21 module#
- class dimcat.steps.loaders.m21.CollectedElements(events: List[dict] = <factory>, control: List[dict] = <factory>, structure: List[dict] = <factory>, annotations: List[str] = <factory>, metadata: DefaultDict = <factory>, part_ids: List[str] = <factory>, prelims: List[str] = <factory>)[source]#
Bases:
object- metadata: DefaultDict#
- class dimcat.steps.loaders.m21.Music21Loader(basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False)[source]#
Bases:
ScoreLoaderExtracts information from scores using music21.
- class dimcat.steps.loaders.m21.Music21Score(source: str)[source]#
Bases:
objectAuxiliary class for extracting facets from a score parsed with music21.
- dimcat.steps.loaders.m21.make_dataframe(records: List[dict], drop_empty_columns: bool = True)[source]#
- dimcat.steps.loaders.m21.parse_ConcreteScale(concrete_scale: ConcreteScale) Tuple[str, ...][source]#
- dimcat.steps.loaders.m21.parse_Measure(measure: Measure, **higher_level_info)[source]#
Inspired by MarkGotham/bar-measure
dimcat.steps.loaders.musescore module#
- class dimcat.steps.loaders.musescore.MuseScoreLoader(basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False, ms: Optional[str] = None)[source]#
Bases:
ScoreLoaderWrapper around the ms3 MuseScore parsing library.
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- check_resource(resource: str | pathlib.Path) None[source]#
Checks whether the resource at the given path exists.
- classmethod from_ms3(directory: str, package_name: Optional[str] = None, as_corpus: bool = False, only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Optional[Union[Pattern, str]] = None, folder_re: Optional[Union[Pattern, str]] = None, exclude_re: Optional[Union[Pattern, str]] = None, paths: Optional[Collection[str]] = None, choose: Literal['auto', 'all', 'ask'] = 'auto', labels_cfg={}, ms=None, logger_cfg: Optional[dict] = None, basepath: Optional[str] = None, loader_name: Optional[str] = None, overwrite: bool = False, auto_validate: bool = True)[source]#
dimcat.steps.loaders.utils module#
- class dimcat.steps.loaders.utils.PathFactory(directory: str, extensions: Optional[Union[str, Iterable[str]]] = None, file_re: Optional[str] = None, folder_re: Optional[str] = None, exclude_re: str = '^(\\.|_)', recursive: bool = True, progress: bool = False, exclude_files_only: bool = False)[source]#
- dimcat.steps.loaders.utils.make_datapackage_descriptor(facet_df_pairs: Iterable[Tuple[str, DataFrame]], package_name: str) dict[source]#