dimcat package#
Subpackages#
- dimcat.data package
- dimcat.steps package
Submodules#
dimcat.base module#
The dimcat.base module defines the three principal classes: DimcatSchema, DimcatObject, and DimcatConfig as well as their interactions.
- dimcat.base.CONTROL_REGISTRY = False#
Raise an error if a subclass has the same name as another subclass. Set True for production.
- class dimcat.base.DimcatConfig(options: Union[Dict, DimcatConfig, str] = (), dtype: Optional[str] = None, **kwargs)[source]#
Bases:
MutableMapping,DimcatObjectBehaves like a dictionary but accepts only keys and values that are valid under the Schema of the DimcatObject specified under the key ‘dtype’. Every DimcatConfig needs to have a ‘dtype’ key that is the name of a DimcatObject and can specify zero or more additional key-value pairs that can be used to instantiate the described object.
When dealing with a DimcatConfig you need to be aware that the ‘dtype’ and ‘options’ can been different things when used as keys as opposed to attributes (
DCrepresents a DimcatConfig):DC['dtype']is the name of the described DimcatObject (equivalent toDC.options_dtype)DC.dtypereturns the class name “DimcatConfig”, according to all DimcatObjects’ default behaviourDC.options(equivalent todict(DC)returns the key-value pairs wrapped by this config, which includes at least the ‘dtype’ keyDC['options']is the value of the ‘options’ option, which exists only if it is part of the described object’s schema, for example if the described object is aDimcatConfigitself.
Examples
>>> from dimcat.base import DimcatConfig, DimcatObject >>> DC = DimcatConfig(dtype="DimcatObject") >>> DC.dtype 'DimcatConfig' >>> DC['dtype'] 'DimcatObject' >>> DC.options {'dtype': 'DimcatObject'} >>> DC['options'] KeyError: 'options' >>> config_config = DC.to_config() >>> config_config.options {'dtype': 'DimcatConfig', 'options': {'dtype': 'DimcatObject'}} >>> config_config['options'] {'dtype': 'DimcatObject'}
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
DimcatSchema- init_object(data, **kwargs) DimcatConfig[source]#
Once the data has been loaded, create the corresponding object.
- complete() Self[source]#
Returns a new Config with missing options filled in with the default values.
- create() DimcatObject[source]#
Creates the object that this DimcatConfig represents. The returned object corresponds to the type
options_dtypeand will be instantiated withoptions.
- classmethod from_dict(options, **kwargs) Self[source]#
Creates a new object from a config-like dict. Concretely, the received
optionswill be updated with the **kwargs and enriched with a ‘dtype’ key corresponding to this object, before deserializing the dict using the corresponding marshmallow schema.
- classmethod from_object(obj: DimcatObject)[source]#
- matches(config: DimcatConfig, covariant: bool = False, contravariant: bool = False) bool[source]#
Returns True if both configs have the same
options_dtype(optionally allowing subclasses) and the overlapping options are equal.- Parameters:
config – The other config to compare against.
covariant – If True, the other config’s dtype matches even if it is a superclass of this config’s dtype. In other words, I describe an object which is of type config.dtype or a subclass of it.
contravariant – If True, the other config’s dtype matches even if it is a subclass of this config’s dtype. In other words, I describe an object which is of type config.dtype or a superclass of it.
- Returns:
Returns True if both configs have the same
options_dtype(optionally allowing subclasses) and the overlapping options are equal.
- property options: dict#
Returns the options dictionary wrapped and controlled by this DimcatConfig. Whenever a new value is set, it is validated against the Schema of the DimcatObject specified under the key ‘dtype’. Note that this property returns a copy of the dictionary including the ‘dtype’ key and modifying it will not affect the DimcatConfig. Also note that the returned value is different from DimcatConfig[“options”]
- property options_schema: DimcatSchema#
Returns the (instantiated) Dimcat singleton object for the class this Config describes.
- class dimcat.base.DimcatObject[source]#
Bases:
ABCAll DiMCAT classes derive from DimcatObject, except for the nested Schema(DimcatSchema) class that they define or inherit.
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
DimcatSchema- init_object(data, **kwargs) DimcatObject[source]#
Once the data has been loaded, create the corresponding object.
- class property dtype: str | enum.Enum#
Name of the class as enum member (if cls._enum_type is define, string otherwise).
- classmethod from_config(config: DimcatConfig, **kwargs) Self[source]#
Creates a new object from a DimcatConfig. Concretely, the config’s
optionswill be updated with the **kwargs and the ‘dtype’ key will be replaced according to this object, before deserializing the dict using the corresponding marshmallow schema.
- classmethod from_dict(options, **kwargs) Self[source]#
Creates a new object from a config-like dict. Concretely, the received
optionswill be updated with the **kwargs and enriched with a ‘dtype’ key corresponding to this object, before deserializing the dict using the corresponding marshmallow schema.
- info(return_str: Literal[False]) None[source]#
- info(return_str: Literal[True]) str
Returns a summary of the dataset.
- class property logger: Logger#
Instances of the Logger class represent a single logging channel. A “logging channel” indicates an area of an application. Exactly how an “area” is defined is up to the application developer. Since an application can have any number of areas, logging channels are identified by a unique string. Application areas can be nested (e.g. an area of “input processing” might include sub-areas “read CSV files”, “read XLS files” and “read Gnumeric files”). To cater for this natural nesting, channel names are organized into a namespace hierarchy where levels are separated by periods, much like the Java or Python package namespace. So in the instance given above, channel names might be “input” for the upper level, and “input.csv”, “input.xls” and “input.gnu” for the sub-levels. There is no arbitrary limit to the depth of nesting.
- class property name: str#
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- class property schema[source]#
Returns the (instantiated) DimcatSchema singleton object for this class.
- to_config() DimcatConfig[source]#
- class dimcat.base.DimcatObjectField(*, load_default: ~typing.Any = <marshmallow.missing>, dump_default: ~typing.Any = <marshmallow.missing>, data_key: ~typing.Optional[str] = None, attribute: ~typing.Optional[str] = None, validate: ~typing.Optional[~typing.Union[~typing.Callable[[~typing.Any], ~typing.Any], ~typing.Iterable[~typing.Callable[[~typing.Any], ~typing.Any]]]] = None, required: bool = False, allow_none: ~typing.Optional[bool] = None, load_only: bool = False, dump_only: bool = False, error_messages: ~typing.Optional[dict[str, str]] = None, metadata: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None)[source]#
Bases:
FieldUsed for (de)serializing attributes resolving to DimcatObjects.
- class dimcat.base.DimcatSchema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
SchemaThe base class of all Schema() classes that are defined or inherited as nested classes for all
DimcatObjects. This class holds the logic for serializing/deserializing DiMCAT objects. However, nested Schema() classes should generally not inherit directly from DimcatSchema but instead from DimcatObject.Schema because it defines the post_load hook init_object() for deserializing Dimcat objects. The parent class DimcatSchema does not define it, allowing DimcatConfig to define it differently. In marshmallow, hooks are additive and do not replace hooks of parent schemas.Overall, this requires careful planning at what point in the object hierarchy the hooks are introduced in the corresponding nested Schema classes: Hooks of parent Schemas can be called via super() but they cannot be not called by omitting super(). For example, the post_dump hook validate_dump() is introduced in PipelineStep.Schema to automatically validate any serialized object right away (frictionless does that basically by trying if it can load the serialization data). For Data.Schema, however, this is not a safe default because most Data objects can be successfully validate only once their data has been stored to disk. Therefore, the post_dump hook is introduced in a second type of schema that all Data objects have, called PickleSchema.
The arbitrary metadata of the fields currently use the keys:
expose: Set False to mark fields that would normally not be exposed to the users in the context of a GUI.Defaults to True if missing.
title: A human-readable title for the field.description: A human-readable description for the field.
- assert_type(obj, **kwargs)[source]#
If this fails, typically, a property of a DimcatObject does not have the type expected by the relevant field. To find out which one, you can use
for attr_name, field_obj in schema.dump_fields.items(): value = field_obj.serialize(attr_name, obj)
(copied from marshmallow.schema._serialize()) where
objis the object to be serialized byschema.
- dtype#
This field specifies the class of the serialized object. Every DimcatObject comes with the corresponding class property that returns its name as a string (or en Enum member that can function as a string). It is inherited by all objects’ schemas and enables their deserialization from a DimcatConfig.
- get_attribute(obj: Any, attr: str, default: Any)[source]#
Defines how to pull values from an object to serialize.
Changed in version 3.0.0a1: Changed position of
objandattr.
- class dimcat.base.DimcatSettings(auto_make_dirs: bool = True, context_columns: ~typing.List[str] = <factory>, default_basepath: str = '~/dimcat_data', default_figure_path: str = '~/dimcat_data', default_figure_format: str = '.png', default_figure_width: int = 2880, default_figure_height: int = 1620, default_resource_name: str = 'unnamed', default_terminal_symbol: str = '⋉', never_store_unvalidated_data: bool = True, recognized_piece_columns: ~typing.List[str] = <factory>, package_descriptor_endings: ~typing.List[str] = <factory>, resource_descriptor_endings: ~typing.List[str] = <factory>)[source]#
Bases:
DimcatObjectThis is a dataclass that stores the default settings for the dimcat library. The current settings can be loaded anywhere in the code by calling
get_setting(). Defining a new setting means adding it in three places:as a class attribute with type annotation (=dataclass field) and default (factory where needed)
as a marshmallow field in the Schema with the same name and corresponding type, which gives access to marshmallow’s full serialization and validation functionality. By default, we add
required=Trueto all settings.to the file
settings.ini, using Python’s config file syntax
- class Schema(*, only: Optional[Union[Sequence[str], AbstractSet[str]]] = None, exclude: Union[Sequence[str], AbstractSet[str]] = (), many: Optional[bool] = None, load_only: Union[Sequence[str], AbstractSet[str]] = (), dump_only: Union[Sequence[str], AbstractSet[str]] = (), partial: Optional[Union[bool, Sequence[str], AbstractSet[str]]] = None, unknown: Optional[Literal['exclude', 'include', 'raise']] = None)[source]#
Bases:
Schema- exclude: set[Any] | MutableSet[Any]#
- unknown: types.UnknownOption#
- context_columns: List[str]#
the columns that are considered essential for locating elements horizontally and vertically and which are therefore always copied, if present, and moved to the left of the new dataframe in the order given here
- never_store_unvalidated_data: bool = True#
setting this to False allows for skipping mandatory validations; set to True for production
- class dimcat.base.DtypeField(*, load_default: ~typing.Any = <marshmallow.missing>, dump_default: ~typing.Any = <marshmallow.missing>, data_key: ~typing.Optional[str] = None, attribute: ~typing.Optional[str] = None, validate: ~typing.Optional[~typing.Union[~typing.Callable[[~typing.Any], ~typing.Any], ~typing.Iterable[~typing.Callable[[~typing.Any], ~typing.Any]]]] = None, required: bool = False, allow_none: ~typing.Optional[bool] = None, load_only: bool = False, dump_only: bool = False, error_messages: ~typing.Optional[dict[str, str]] = None, metadata: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None)[source]#
Bases:
Field
- class dimcat.base.FriendlyEnum(value)[source]#
Bases:
LowercaseEnumLike LowercaseEnum, Members of this Enum can be created from and compared to strings in a case-insensitive manner. In addition, this type of Enum is friendly enough to also allow for shortened values (i.e. having only the first few letters), as long as the abbreviation is unambiguous.
- class dimcat.base.FriendlyEnumField(enum: type[enum.Enum], *, by_value: bool = True, **kwargs)[source]#
Bases:
EnumThis fields is identical with the standard
Marshmallow Enum fieldexcept for the fact that enum members are created based on enum values instead of enum names. This incorporates the benefits of theFriendlyEnum, i.e. case-insensitive creation from partial strings.
- class dimcat.base.ListOfStringsField(cls_or_instance=<fields.String(dump_default=<marshmallow.missing>, attribute=None, validate=None, required=False, load_only=False, dump_only=False, load_default=<marshmallow.missing>, allow_none=False, error_messages={'required': 'Missing data for required field.', 'null': 'Field may not be null.', 'validator_failed': 'Invalid value.', 'invalid': 'Not a valid string.', 'invalid_utf8': 'Not a valid utf-8 string.'})>, **kwargs)[source]#
Bases:
List
- class dimcat.base.LowercaseEnum(value)[source]#
-
Members of this Enum can be created from and compared to strings in a case-insensitive manner.
- class dimcat.base.ObjectEnum(value)[source]#
Bases:
FriendlyEnumAn enumeration.
- get_class() Type[DimcatObject][source]#
- dimcat.base.deserialize_config(config: DimcatConfig) DimcatObject[source]#
Deserialize a config object into a DimcatObject.
- dimcat.base.deserialize_dict(obj_data: dict) DimcatObject[source]#
Deserialize a dict into a DimcatObject.
- dimcat.base.deserialize_json_file(json_file: pathlib.Path | str) DimcatObject[source]#
Deserialize a JSON file into a DimcatObject.
- dimcat.base.deserialize_json_str(json_data: str) DimcatObject[source]#
Deserialize a JSON string into a DimcatObject.
- dimcat.base.get_class(name: str) Type[DO][source]#
Resolve the given name to the class of the corresponding DimcatObject.
- dimcat.base.get_pickle_schema(name, init=True)[source]#
Caches the intialized schema for each class. Pass init=False to retrieve the schema constructor.
- dimcat.base.get_schema(name, init=True)[source]#
Caches the intialized schema for each class. Pass init=False to retrieve the schema constructor.
- dimcat.base.is_instance_of(obj, class_or_tuple: Union[Type, str, Tuple[Union[Type, str], ...]])[source]#
Returns True if the given object is an instance of the given class or one of the given classes.
- dimcat.base.is_name_of_dimcat_class(name: str) bool[source]#
Returns True if the given name can be resolved to the name of a DimcatObject.
- dimcat.base.is_subclass_of(name: str, parent: Union[str, Type[DimcatObject]]) bool[source]#
Returns True if the DimcatObject with the given name is a subclass of the given parent.
- dimcat.base.load_settings(config_filepath: Optional[str] = None, raise_exception: bool = False) DimcatConfig[source]#
Get the DimcatSettings object.
- dimcat.base.make_config_from_specs(specs: DO, instance_of: Optional[Union[Type[DO], str]] = None) DimcatConfig[source]#
Returns a DimcatConfig corresponding to the given specs. If a DimcatConfig with dtype ‘DimcatConfig’ is received, the described (inner) DimcatConfig is returned.
- Raises:
TypeError – If the feature cannot be converted to a dimcat configuration.
- dimcat.base.make_default_settings() DimcatConfig[source]#
Make a DimcatConfig object representing DimcatSettings with default values.
- dimcat.base.make_object_from_specs(specs: Union[DO, Type[DO], DimcatConfig, MutableMapping, ObjectEnum, str], instance_of: Optional[Union[Type[DO], str]] = None) DO[source]#
Returns the DimcatObject corresponding to the given specs.
- dimcat.base.make_settings_from_config_file(config_filepath: str, fallback_to_default: bool = True) DimcatConfig[source]#
Make a DimcatSettings object from a config file.
- dimcat.base.make_settings_from_config_parser(config: ConfigParser) DimcatConfig[source]#
Make a DimcatSettings object from a ConfigParser object.
- dimcat.base.parse_config_file(config_filepath: str) ConfigParser[source]#
Parse a config file and return a ConfigParser object.
dimcat.cli module#
dimcat.dc_exceptions module#
- exception dimcat.dc_exceptions.BaseFilePathMismatchError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (basepath, filepath)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The (relative) filepath needs to be located beneath the basepath, not above or next to it.', 1: <function BaseFilePathMismatchError.<lambda>>, 2: <function BaseFilePathMismatchError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.BasePathNotDefinedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErrorNo optional args.
- exception dimcat.dc_exceptions.DataframeIncompatibleWithColumnSchemaError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, validation_error, schema_field_names, df_column_names)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The dataframe is incompatible with the column schema.', 1: <function DataframeIncompatibleWithColumnSchemaError.<lambda>>, 2: <function DataframeIncompatibleWithColumnSchemaError.<lambda>>, 3: <function DataframeIncompatibleWithColumnSchemaError.<lambda>>, 4: <function DataframeIncompatibleWithColumnSchemaError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.DataframeIsMissingExpectedColumnsError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (missing_columns, present_columns)
Different from ResourceIsMissingFeatureColumnError in that it can be raised by a function that has access only to the dataframe, not the resource.
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The dataframe is missing an expected column.', 1: <function DataframeIsMissingExpectedColumnsError.<lambda>>, 2: <function DataframeIsMissingExpectedColumnsError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.DatasetNotProcessableError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (missing,)
- exception dimcat.dc_exceptions.DimcatError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
Exception
- exception dimcat.dc_exceptions.DuplicateIDError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (id, facet)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'An ID was already in use.', 1: <function DuplicateIDError.<lambda>>, 2: <function DuplicateIDError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.DuplicatePackageNameError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (package_name,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'A package with the same name already exists.', 1: <function DuplicatePackageNameError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.DuplicateResourceIDsError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (id_counter,)
- exception dimcat.dc_exceptions.EmptyCatalogError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatError
- exception dimcat.dc_exceptions.EmptyDatasetError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (dataset_name,)
- exception dimcat.dc_exceptions.EmptyPackageError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (package_name,)
- exception dimcat.dc_exceptions.EmptyResourceError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name,)
- exception dimcat.dc_exceptions.ExcludedFileExtensionError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (extension, permissible_extensions)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'A file extension is excluded.', 1: <function ExcludedFileExtensionError.<lambda>>, 2: <function ExcludedFileExtensionError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.FeatureIsMissingFormatColumnError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (feature_name, missing_column(s), format, feature_type)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The feature is missing the column corresponding to this format.', 1: <function FeatureIsMissingFormatColumnError.<lambda>>, 2: <function FeatureIsMissingFormatColumnError.<lambda>>, 3: <function FeatureIsMissingFormatColumnError.<lambda>>, 4: <function FeatureIsMissingFormatColumnError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.FeatureNotProcessableError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, pipeline_step, allowed)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Cannot process this Feature.', 1: <function FeatureNotProcessableError.<lambda>>, 2: <function FeatureNotProcessableError.<lambda>>, 3: <function FeatureNotProcessableError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
Bases:
DimcatErroroptional args: (feature_name, getting_from_name)
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.FeatureWithUndefinedValueColumnError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (feature_name, feature_type
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'No value_column is defined for this feature.', 1: <function FeatureWithUndefinedValueColumnError.<lambda>>, 2: <function FeatureWithUndefinedValueColumnError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.FilePathNotDefinedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErrorNo optional args.
- exception dimcat.dc_exceptions.GrouperNotSetUpError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (grouper_name,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: "The grouper has not been setup. Applying it would result in empty features. Set the attribute 'grouped_units'.", 1: <function GrouperNotSetUpError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.InvalidResourcePathError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (filepath, basepath)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource path is invalid.', 1: <function InvalidResourcePathError.<lambda>>, 2: <function InvalidResourcePathError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.NoFeaturesActiveError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErrorNo optional args.
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'No features are currently active and none have been requested. Apply a FeatureExtractor first or pass specs for at least one feature to be extracted.'}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.NoMatchingPipelineStepFoundError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (step_specs,). Pass no arguments if the pipeline is empty.
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Pipeline does not include any steps.', 1: <function NoMatchingPipelineStepFoundError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.NoMatchingResourceFoundError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (config, package_name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'No matching resource found.', 1: <function NoMatchingResourceFoundError.<lambda>>, 2: <function NoMatchingResourceFoundError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.NoMuseScoreExecutableSpecifiedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErrorNo optional args.
- exception dimcat.dc_exceptions.NoPathsSpecifiedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErrorNo optional args.
- exception dimcat.dc_exceptions.PackageDescriptorHasWrongTypeError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (expected_type, actual_type, name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The package descriptor resolves to the wrong type.', 1: <function PackageDescriptorHasWrongTypeError.<lambda>>, 2: <function PackageDescriptorHasWrongTypeError.<lambda>>, 3: <function PackageDescriptorHasWrongTypeError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.PackageInconsistentlySerializedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (package_name, existing_path, missing_path)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The package has been serialized in an inconsistent way, found only ZIP or descriptor, not both.', 1: <function PackageInconsistentlySerializedError.<lambda>>, 2: <function PackageInconsistentlySerializedError.<lambda>>, 3: <function PackageInconsistentlySerializedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.PackageNotFoundError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (package_name,)
- exception dimcat.dc_exceptions.PackageNotFullySerializedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (error_message,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'All resources contained in the package have not been serialized.', 1: <function PackageNotFullySerializedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.PackagePathsNotAlignedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (error_message,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Package paths are not aligned.', 1: <function PackagePathsNotAlignedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
Bases:
DimcatErroroptional args: (name, path)
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceAlreadyTransformed(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (name, processor)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Resource has already been processed.', 1: <function ResourceAlreadyTransformed.<lambda>>, 2: <function ResourceAlreadyTransformed.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceDescriptorHasWrongTypeError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (expected_type, actual_type, name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource descriptor resolves to the wrong type.', 1: <function ResourceDescriptorHasWrongTypeError.<lambda>>, 2: <function ResourceDescriptorHasWrongTypeError.<lambda>>, 3: <function ResourceDescriptorHasWrongTypeError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsFrozenError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, current_basepath, new_basepath)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Resource is frozen, i.e. tied to data stored on disk, so you would need to copy it for the relative paths ro remain valid.', 1: <function ResourceIsFrozenError.<lambda>>, 2: <function ResourceIsFrozenError.<lambda>>, 3: <function ResourceIsFrozenError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsMisalignedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
ResourceIsFrozenError- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Package prevents adding resources that cannot be aligned with it without copying them. Consider using one of the subtypes such as PathPackage or DimcatPackage.', 1: <function ResourceIsMisalignedError.<lambda>>, 2: <function ResourceIsMisalignedError.<lambda>>, 3: <function ResourceIsMisalignedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsMissingCorpusIndexError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, name_of_missing)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource is missing a corpus index level.', 1: <function ResourceIsMissingCorpusIndexError.<lambda>>, 2: <function ResourceIsMissingCorpusIndexError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsMissingFeatureColumnError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, missing_column(s), feature_name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource is missing a feature column.', 1: <function ResourceIsMissingFeatureColumnError.<lambda>>, 2: <function ResourceIsMissingFeatureColumnError.<lambda>>, 3: <function ResourceIsMissingFeatureColumnError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsMissingPieceIndexError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, name_of_missing)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource is missing a piece index level.', 1: <function ResourceIsMissingPieceIndexError.<lambda>>, 2: <function ResourceIsMissingPieceIndexError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceIsPackagedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
ResourceIsFrozenErroroptional args: (name, new_path, path_type)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resource is packaged can cannot store its own descriptor.', 1: <function ResourceIsPackagedError.<lambda>>, 2: <function ResourceIsPackagedError.<lambda>>, 3: <function ResourceIsPackagedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceNamesNonUniqueError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (names_or_paths,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'The resulting resource names are not unique.', 1: <function ResourceNamesNonUniqueError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceNeedsToBeCopiedError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (path_type, new_path,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Resource would need copying.', 1: <function ResourceNeedsToBeCopiedError.<lambda>>, 2: <function ResourceNeedsToBeCopiedError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceNotFoundError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, package_name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Resource not found.', 1: <function ResourceNotFoundError.<lambda>>, 2: <function ResourceNotFoundError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.ResourceNotProcessableError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (resource_name, pipeline_step, resource_type)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Cannot process this Resource.', 1: <function ResourceNotProcessableError.<lambda>>, 2: <function ResourceNotProcessableError.<lambda>>, 3: <function ResourceNotProcessableError.<lambda>>, 4: <function ResourceNotProcessableError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.SlicerNotSetUpError(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (slicer_name,)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: "The slicer has not been setup. Applying it would result in empty features. Set the attribute 'slice_intervals'.", 1: <function SlicerNotSetUpError.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
- exception dimcat.dc_exceptions.UnknownFormat(*args, message: Optional[str] = None, **kwargs)[source]#
Bases:
DimcatErroroptional args: (format_value, format_enum, resource_type, resource_name)
- nargs2message: ClassVar[Dict[str, Union[str, Callable]]] = {0: 'Unknown format.', 1: <function UnknownFormat.<lambda>>, 2: <function UnknownFormat.<lambda>>, 3: <function UnknownFormat.<lambda>>, 4: <function UnknownFormat.<lambda>>}#
Mapping the number of arguments passed to the Error to a lambda string constructor accepting that number of arguments and generates the error message.
dimcat.dc_warnings module#
- exception dimcat.dc_warnings.OrderOfPipelineStepsWarning[source]#
Bases:
UserWarningThis warning is shown when the order of pipeline steps may lead to unexpected behaviour.
- exception dimcat.dc_warnings.PotentiallyMisalignedPackageUserWarning[source]#
Bases:
UserWarningThis warning is shown when resources are added to a package whose status is not ALLOW_MISALIGNMENT but which has no defined basepath.
Bases:
UserWarningThis warning is shown when, as a result of modifying a basepath, the descriptor_path points to a pre-existing file on disk which could potentially have nothing to do with the current resource.
- exception dimcat.dc_warnings.ResourceWithRangeIndexUserWarning[source]#
Bases:
UserWarningThis warning is shown when a resource has a range index, which is typically the case for dataframes holding information for single piece only.
dimcat.enums module#
dimcat.plotting module#
dimcat.utils module#
Utility functions that are or might be used by several modules or useful in external contexts.
- class dimcat.utils.SortOrder(value)[source]#
Bases:
FriendlyEnumAn enumeration.
- ASCENDING = 'ASCENDING'#
- DESCENDING = 'DESCENDING'#
- NONE = 'NONE'#
- dimcat.utils.check_file_path(filepath: str, extensions: Optional[Union[str, Collection[str]]] = None, must_exist: bool = True) str[source]#
Checks that the filepath exists and raises an exception otherwise (or if it doesn’t have a valid extension).
- Parameters:
filepath –
extensions –
must_exist – If True (default), raises FileNotFoundError if the file does not exist.
- Returns:
The path turned into an absolute path.
- Raises:
FileNotFoundError – If the file does not exist and must_exist is True.
ValueError – If the file does not have one of the specified extensions, if any.
- dimcat.utils.check_name(name: str) None[source]#
Check if a name is valid according to frictionless.
- Raises:
ValueError – If the name is not valid.
- dimcat.utils.clean_index_levels(pandas_obj)[source]#
Remove index levels “IDs”, “corpus” and “fname”, if redundant.
- dimcat.utils.get_composition_year(metadata_dict)[source]#
The logic for getting a composition year out of the given metadata dictionary.
- dimcat.utils.get_middle_composition_year(metadata: DataFrame, composed_start_column: str = 'composed_start', composed_end_column: str = 'composed_end') Series[source]#
Returns the middle of the composition year range.
- dimcat.utils.get_object_value(obj, key, default)[source]#
Return obj[key] if possible, obj.key otherwise. Code copied from marshmallow.utils._get_value_for_key()
- dimcat.utils.grams(lists_of_symbols, n=2, to_string: bool = False)[source]#
Returns a list of n-gram tuples for given list. List can be nested.
Use nesting to exclude transitions between pieces or other units.
- dimcat.utils.interval_index2interval(ix)[source]#
Takes an interval index and returns the interval corresponding to [min(left), max(right)).
- dimcat.utils.is_uri(path: str) bool[source]#
Solution from https://stackoverflow.com/a/38020041
- dimcat.utils.make_extension_regex(extensions: Iterable[str], enforce_initial_dot: bool = False) Pattern[source]#
Turns file extensions into a regular expression.
- dimcat.utils.make_suffix(*params)[source]#
Turns the passed parameter values into a suffix string.
- Parameters:
params (str or Collection or number) – Parameters to turn into string components of the returned suffix. None values are ignored. Pairs of the form (str, <param>) are treated specially in that the initial str is treated as a prefix of the string component unless <param> is an empty/None/False value.
- Returns:
A suffix string where the passed values are joined together, separated by ‘-‘.
- Return type:
Examples
>>> make_suffix('str', None, False, {0, 1.}) '-str-False-0|1.0' >>> make_suffix(['collection', 0], ('zero', 0), ('prefix', 1), ('flag', True)) '-collection|0-prefix1-flag'
- dimcat.utils.make_transition_matrix(nested_sequences: Optional[list] = None, ngrams: Optional[List[tuple]] = None, n: int = 2, k: Optional[int] = None, smooth: int = 0, normalize: bool = False, IC: bool = False, excluded_grams: Optional[Any] = None, distinct_only: bool = False, sort: bool = False, percent: bool = False, decimals: Optional[int] = None)[source]#
Returns a transition table from a list of symbols or from a list of n-grams.
Column index is the last item of grams, row index the n-1 preceding items.
- Parameters:
nested_sequences – List of elements between which the transitions are calculated. If specified,
ngramsmust be None. List can be nested.ngrams – List of tuples being n-grams. If specified,
nested_sequencesmust be None.n – Make n-grams. Only relevant if
nested_sequencesis specified.k – Number of rows and columns that you want to keep. Defaults to all.
smooth – Initial count value of all transitions
normalize – Set to True to divide every row by the sum of the row.
IC – Set True to calculate information content.
excluded_grams – Elements you want to exclude from the table. All ngrams containing at least one of the elements will be filtered out.
distinct_only – if True, n-grams consisting only of identical elements are filtered out
sort – By default, the indices are ordered by gram frequency. Pass True to sort by bigram counts.
percent – Pass True to multiply the matrix by 100 before rounding to
decimalsdecimals – To how many decimals you want to round the matrix.
- Returns:
For each (n-1) previous elements (index), the number or proportion of transitions to each possible following element (columns).
- dimcat.utils.make_valid_frictionless_name_from_filepath(path: str, include_extension=True, replace_char='_') str[source]#
- dimcat.utils.nest_level(obj, include_tuples=False)[source]#
Recursively calculate the depth of a nested list.
- dimcat.utils.replace_ext(filepath, new_ext)[source]#
Replace the extension of any given file path with a new one which can be given with or without a leading dot.
- dimcat.utils.resolve_path(path: str) AbsolutePathStr[source]#
- dimcat.utils.resolve_path(path: Literal[None]) None
Resolves ‘~’ to HOME directory and turns
pathinto an absolute path.
- dimcat.utils.scan_directory(directory, extensions, file_re, folder_re, exclude_re, recursive, return_tuples: Literal[False], progress, exclude_files_only) Iterator[str][source]#
- dimcat.utils.scan_directory(directory, extensions, file_re, folder_re, exclude_re, recursive, return_tuples: Literal[True], progress, exclude_files_only) Iterator[Tuple[str, str]]
Depth-first generator of filtered file paths in
directory.- Parameters:
directory – Directory to be scanned for files.
extensions – File extensions to be included (with or without leading dot). Defaults to all extensions.
file_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
folder_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
exclude_re – Exclude files and folders (unless
exclude_files_only=True) containing this regular expression. Excludes files starting with a dot or underscore by default, prevent by setting to None or ‘’.recursive – By default, subdirectories are recursively scanned. Pass False to scan only
dir.return_tuples – By default, full file paths are returned. Pass True to return (path, name) tuples instead.
progress – Pass True to display the progress (useful for large directories).
exclude_files_only – By default,
exclude_reexcludes files and folder. Pass True to exclude only files matching the regEx.
- Yields:
Full file path or, if
return_tuples=True, (path, file_name) pairs in random order.
- dimcat.utils.treat_basepath_argument(path: str, other_logger) AbsolutePathStr[source]#
- dimcat.utils.treat_basepath_argument(path: Literal[None], other_logger) None
Turns
basepathinto an absolute path and checks that it exists.- Raises:
NotADirectoryError – If
basepathis not an existing directory.