---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.1
kernelspec:
  display_name: dimcat
  language: python
  name: dimcat
---

# Quick demo

## Import dimcat and load data

```{code-cell} ipython3
import dimcat as dc
from dimcat.data import resources
from dimcat.steps import analyzers, extractors, groupers

package_path = "dcml_corpora.datapackage.json"
dataset = dc.Dataset.from_package(package_path)
dataset
```

## Show metadata

```{code-cell} ipython3
dataset.get_metadata()
```

## Counting notes

### Variant 1: Extract feature, apply Counter

Here we pass the extracted notes to the counter.

```{code-cell} ipython3
notes = dataset.get_feature("notes")
result = analyzers.Counter().process(notes)
result.plot()
```

The `FeatureExtractor` is added to the dataset's pipeline implicitly, but the `Counter` is not because it's applied only to the extracted feature:

```{code-cell} ipython3
dataset
```

The pitch-class distributions shown by `.plot()` correspond to the current **unit of analysis**, which defaults to the piece-level.
Results also come with a second plotting method, `.plot_grouped()`. Since no groupers have been applied, the entire dataset is treated as a single group:

```{code-cell} ipython3
result.plot_grouped()
```

### Variant 2: Imply feature extraction in the analyzer

Here we pass the dataset to the counter.

```{code-cell} ipython3
counter = analyzers.Counter(features="notes")
analyzed_dataset = counter.process(dataset)
analyzed_dataset.get_result().plot()
```

Applying an `Analyzer` to a `Dataset` yields an `AnalyzedDataset` that includes one `Result` resource per analyzed `Feature`.
Both are to be found in the respective packages in the outputs catalog:

```{code-cell} ipython3
analyzed_dataset
```

### Variant 3: Define a Pipeline with FeatureExtractor and Counter

```{code-cell} ipython3
pipeline = dc.Pipeline([
    extractors.FeatureExtractor("notes"),
    analyzers.Counter()
])
analyzed_dataset = pipeline.process(dataset)
analyzed_dataset.get_result().plot()
```

## Grouped note counts

Let's define a CustomPieceGrouper from random piece groups:

* We create a `PieceIndex`, which is essentially a fancy list of piece ID tuples.
* From this, we sample `n_groups` groups of `n_members` piece tuples each. A `grouping` is a mapping of group names to piece IDs.
* Then, we set up a `CustomPieceGrouper` from the grouping. Inspecting it, we see that it stores a `PieceIndex` in which the first
  level corresponds to the three group names, `group_1`, `group_2`, and `group_3`. Whenever we apply this grouper, it will prepend
  this level to any processed Resource (provided it contains the grouped pieces). This changes the behaviour of the grouped resource,
  e.g. when plotting it.

```{code-cell} ipython3
n_groups = 3
n_members = 30

piece_index = resources.PieceIndex.from_resource(notes)
grouping = {f"group_{i}": piece_index.sample(n_members) for i in range(1, n_groups + 1)}
grouper = groupers.CustomPieceGrouper.from_grouping(grouping)
grouper
### Applying the grouper to the analysis result
```

```{code-cell} ipython3
grouped_result = grouper.process(result)
grouped_result
```

```{code-cell} ipython3
grouped_result.plot_grouped()
```

As promised, the grouped result plots differently: Instead of showing pitch-class distributions for each of the grouped pieces,
(which we can still obtain by calling `.plot()`), it shows the pitch-class distributions for each of the groups.
However, for closer inspection, the area of a circle is not ideal, so let's view it as a bar plot:

```{code-cell} ipython3
grouped_result.make_bar_plot()
```

### Step.process(Data) == Data.apply_step(Step)

Above, we have applied **Steps**, an analyzer, a grouper, and a pipeline, to **Data** objects, namely
resources (to the `Notes` feature and to the `Counts` result) and to a dataset containing these resources.
Another way to achieve the same goal is by applying steps to data. Let's start with a fresh dataset and
apply the grouper and the analyzer once more:

```{code-cell} ipython3
D = dc.Dataset.from_package(package_path)
analyzed_dataset = D.apply_step(grouper, counter)
analyzed_dataset
```

```{code-cell} ipython3
result = analyzed_dataset.get_result()
result
```

```{code-cell} ipython3
result.default_groupby
```

```{code-cell} ipython3
analyzed_dataset.get_result().make_bar_plot()
```

## Assembling the Pipeline from DimcatConfig objects

Serialization of any DimcatObject uses the `DimcatConfig` object. Each config needs to have at least the key `dtype`,
specifying the name of a DimcatObject. Any other keys need to correspond to init arguments of that object. Wrong keys
or invalid values [are rejected](./errors.md#invalid-option).

Any DimcatObject can be expressed as a config by calling its `.to_config()` method:

```{code-cell} ipython3
config = counter.to_config()
config
```

Any config can be used to instantiate a DimcatObject:

```{code-cell} ipython3
counter_copy = config.create()
print(f"""The new object and the old object are
equal: {counter == counter_copy}
identical: {counter is counter_copy}""")
```

Wherever DiMCAT operates with configs, it also accepts dictionaries:

```{code-cell} ipython3
step_configs = [
    dict(dtype="FeatureExtractor", features=[dict(dtype="Notes", format="FIFTHS")]),
    dict(dtype='CustomPieceGrouper', grouped_units=grouping),
    dict(dtype="Counter")
]
pl = dc.Pipeline.from_step_configs(step_configs)
pl
```

```{code-cell} ipython3
resulting_dataset = pl.process(dataset)
resulting_dataset.get_result().make_bar_plot()
```