Quick demo#

Import dimcat and load data#

import dimcat as dc
from dimcat.data import resources
from dimcat.steps import analyzers, extractors, groupers

package_path = "dcml_corpora.datapackage.json"
dataset = dc.Dataset.from_package(package_path)
dataset
Dataset
=======
{'inputs': {'basepath': None,
            'packages': {'dcml_corpora': ["'dcml_corpora.measures' (MuseScoreMeasures)",
                                          "'dcml_corpora.notes' (MuseScoreNotes)",
                                          "'dcml_corpora.expanded' (MuseScoreHarmonies)",
                                          "'dcml_corpora.chords' (MuseScoreChords)",
                                          "'dcml_corpora.metadata' (Metadata)"]}},
 'outputs': {'basepath': None, 'packages': {}},
 'pipeline': []}

Show metadata#

dataset.get_metadata()
TimeSig KeySig last_mc last_mn length_qb last_mc_unfolded last_mn_unfolded length_qb_unfolded volta_mcs all_notes_qb ... imslp.1 key mode typesetter electronic editor electronic encoder text pdf score integrity PDF
corpus piece
ABC n01op18-1_01 {1: '3/4'} {1: -1} 313 313 939.0 427 427 1281.0 () 3132.75 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
n01op18-1_02 {1: '9/8'} {1: -1} 110 110 495.0 110 110 495.0 () 1647.75 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
n01op18-1_03 {1: '3/4'} {1: -1} 145 145 435.0 246 246 738.0 () 1536.00 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
n01op18-1_04 {1: '2/4'} {1: -1} 381 381 762.0 381 381 762.0 () 2424.50 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
n02op18-2_01 {1: '2/4'} {1: 1} 249 249 498.0 330 330 660.0 () 1504.38 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
tchaikovsky_seasons op37a08 {1: '6/8'} {1: 2} 199 198 595.5 199 198 595.5 () 1994.00 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> https://imslp.org/wiki/Special:ReverseLookup/1... Tom Schreyer <NA>
op37a09 {1: '4/4'} {1: 1} 90 90 360.0 90 90 360.0 () 1430.33 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> https://imslp.org/wiki/Special:ReverseLookup/1... Tom Schreyer <NA>
op37a10 {1: '4/4'} {1: -1} 56 56 224.0 56 56 224.0 () 808.00 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> https://imslp.org/wiki/Special:ReverseLookup/1... Tom Schreyer <NA>
op37a11 {1: '4/4'} {1: 4, 28: 1, 51: 4} 83 83 332.0 83 83 332.0 () 967.92 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> https://imslp.org/wiki/Special:ReverseLookup/1... Tom Schreyer <NA>
op37a12 {1: '3/4'} {1: -4, 88: 4, 149: -4} 176 176 528.0 263 263 789.0 () 1856.50 ... <NA> <NA> <NA> <NA> <NA> <NA> <NA> https://imslp.org/wiki/Special:ReverseLookup/1... Tom Schreyer <NA>

560 rows × 72 columns

Counting notes#

Variant 1: Extract feature, apply Counter#

Here we pass the extracted notes to the counter.

notes = dataset.get_feature("notes")
result = analyzers.Counter().process(notes)
result.plot()
WARNING  matplotlib.font_manager -- /home/docs/checkouts/readthedocs.org/user_builds/dimcat/envs/stable/lib/python3.10/site-packages/matplotlib/font_manager.py (line 1095) in <lambda>():
	Matplotlib is building the font cache; this may take a moment.

The FeatureExtractor is added to the dataset’s pipeline implicitly, but the Counter is not because it’s applied only to the extracted feature:

dataset
Dataset
=======
{'inputs': {'basepath': None,
            'packages': {'dcml_corpora': ["'dcml_corpora.measures' (MuseScoreMeasures)",
                                          "'dcml_corpora.notes' (MuseScoreNotes)",
                                          "'dcml_corpora.expanded' (MuseScoreHarmonies)",
                                          "'dcml_corpora.chords' (MuseScoreChords)",
                                          "'dcml_corpora.notes.notes.metadata' (Metadata)"]}},
 'outputs': {'basepath': None,
             'packages': {'features': ["'dcml_corpora.notes.notes.metadata' (Metadata)",
                                       "'dcml_corpora.notes.notes' (Notes)"]}},
 'pipeline': ['FeatureExtractor', 'FeatureExtractor']}

The pitch-class distributions shown by .plot() correspond to the current unit of analysis, which defaults to the piece-level. Results also come with a second plotting method, .plot_grouped(). Since no groupers have been applied, the entire dataset is treated as a single group:

result.plot_grouped()

Variant 2: Imply feature extraction in the analyzer#

Here we pass the dataset to the counter.

counter = analyzers.Counter(features="notes")
analyzed_dataset = counter.process(dataset)
analyzed_dataset.get_result().plot()

Applying an Analyzer to a Dataset yields an AnalyzedDataset that includes one Result resource per analyzed Feature. Both are to be found in the respective packages in the outputs catalog:

analyzed_dataset
AnalyzedDataset
===============
{'inputs': {'basepath': None,
            'packages': {'dcml_corpora': ["'dcml_corpora.measures' (MuseScoreMeasures)",
                                          "'dcml_corpora.notes' (MuseScoreNotes)",
                                          "'dcml_corpora.expanded' (MuseScoreHarmonies)",
                                          "'dcml_corpora.chords' (MuseScoreChords)",
                                          "'dcml_corpora.notes.notes.metadata' (Metadata)"]}},
 'outputs': {'basepath': None,
             'packages': {'results': ["'dcml_corpora.notes.notes.counted' (Counts)"],
                          'features': ["'dcml_corpora.notes.notes.metadata' (Metadata)",
                                       "'dcml_corpora.notes.notes' (Notes)"]}},
 'pipeline': ['FeatureExtractor', 'FeatureExtractor', 'Counter']}

Variant 3: Define a Pipeline with FeatureExtractor and Counter#

pipeline = dc.Pipeline([
    extractors.FeatureExtractor("notes"),
    analyzers.Counter()
])
analyzed_dataset = pipeline.process(dataset)
analyzed_dataset.get_result().plot()

Grouped note counts#

Let’s define a CustomPieceGrouper from random piece groups:

  • We create a PieceIndex, which is essentially a fancy list of piece ID tuples.

  • From this, we sample n_groups groups of n_members piece tuples each. A grouping is a mapping of group names to piece IDs.

  • Then, we set up a CustomPieceGrouper from the grouping. Inspecting it, we see that it stores a PieceIndex in which the first level corresponds to the three group names, group_1, group_2, and group_3. Whenever we apply this grouper, it will prepend this level to any processed Resource (provided it contains the grouped pieces). This changes the behaviour of the grouped resource, e.g. when plotting it.

n_groups = 3
n_members = 30

piece_index = resources.PieceIndex.from_resource(notes)
grouping = {f"group_{i}": piece_index.sample(n_members) for i in range(1, n_groups + 1)}
grouper = groupers.CustomPieceGrouper.from_grouping(grouping)
grouper
### Applying the grouper to the analysis result
CustomPieceGrouper
==================
{'dtype': 'CustomPieceGrouper',
 'features': [],
 'level_name': 'piece_group',
 'grouped_units': {'dtype': 'PieceIndex',
                   'basepath': None,
                   'index': [('group_1', 'ABC', 'n15op132_03'),
                             ('group_1', 'grieg_lyric_pieces', 'op38n08'),
                             ('group_1', 'ABC', 'n14op131_01'),
                             ('group_1', 'corelli', 'op03n10b'),
                             ('group_1', 'grieg_lyric_pieces', 'op12n03'),
                             ('group_1', 'beethoven_piano_sonatas', '10-1'),
                             ('group_1', 'medtner_tales', 'op34n03'),
                             ('group_1', 'chopin_mazurkas', 'BI89-3op24-3'),
                             ('group_1', 'mozart_piano_sonatas', 'K279-2'),
                             ('group_1', 'corelli', 'op04n02d'),
                             ('group_1', 'grieg_lyric_pieces', 'op68n01'),
                             ('group_1', 'beethoven_piano_sonatas', '30-2'),
                             ('group_1', 'beethoven_piano_sonatas', '13-4'),
                             ('group_1', 'grieg_lyric_pieces', 'op54n01'),
                             ('group_1', 'ABC', 'n02op18-2_04'),
                             ('group_1', 'beethoven_piano_sonatas', '04-2'),
                             ('group_1', 'liszt_pelerinage', '161.07_Apres_une_lecture_du_Dante'),
                             ('group_1', 'chopin_mazurkas', 'BI77-3op17-3'),
                             ('group_1', 'ABC', 'n06op18-6_02'),
                             ('group_1', 'ABC', 'n11op95_01'),
                             ('group_1', 'ABC', 'n05op18-5_02'),
                             ('group_1', 'corelli', 'op01n12b'),
                             ('group_1', 'beethoven_piano_sonatas', '05-3'),
                             ('group_1', 'medtner_tales', 'op26n02'),
                             ('group_1', 'beethoven_piano_sonatas', '01-3'),
                             ('group_1', 'corelli', 'op03n06d'),
                             ('group_1', 'grieg_lyric_pieces', 'op12n02'),
                             ('group_1', 'ABC', 'n13op130_05'),
                             ('group_1', 'beethoven_piano_sonatas', '26-2'),
                             ('group_1', 'corelli', 'op03n01b'),
                             ('group_2', 'chopin_mazurkas', 'BI122op41-2'),
                             ('group_2', 'beethoven_piano_sonatas', '04-4'),
                             ('group_2', 'liszt_pelerinage', '160.09_Les_Cloches_de_Geneve_(Nocturne)'),
                             ('group_2', 'mozart_piano_sonatas', 'K282-2'),
                             ('group_2', 'corelli', 'op03n01a'),
                             ('group_2', 'chopin_mazurkas', 'BI16-1'),
                             ('group_2', 'grieg_lyric_pieces', 'op12n03'),
                             ('group_2', 'mozart_piano_sonatas', 'K311-2'),
                             ('group_2', 'corelli', 'op03n11c'),
                             ('group_2', 'beethoven_piano_sonatas', '21-3'),
                             ('group_2', 'grieg_lyric_pieces', 'op71n01'),
                             ('group_2', 'corelli', 'op03n10b'),
                             ('group_2', 'beethoven_piano_sonatas', '11-2'),
                             ('group_2', 'grieg_lyric_pieces', 'op57n06'),
                             ('group_2', 'grieg_lyric_pieces', 'op71n04'),
                             ('group_2', 'debussy_suite_bergamasque', 'l075-03_suite_clair'),
                             ('group_2', 'corelli', 'op04n11b'),
                             ('group_2', 'grieg_lyric_pieces', 'op47n07'),
                             ('group_2', 'corelli', 'op04n02b'),
                             ('group_2', 'ABC', 'n05op18-5_04'),
                             ('group_2', 'beethoven_piano_sonatas', '09-3'),
                             ('group_2', 'chopin_mazurkas', 'BI162-3op63-3'),
                             ('group_2', 'medtner_tales', 'op34n03'),
                             ('group_2', 'chopin_mazurkas', 'BI140'),
                             ('group_2', 'corelli', 'op04n05c'),
                             ('group_2', 'grieg_lyric_pieces', 'op47n04'),
                             ('group_2', 'corelli', 'op03n01d'),
                             ('group_2', 'grieg_lyric_pieces', 'op65n06'),
                             ('group_2', 'chopin_mazurkas', 'BI163op67-4'),
                             ('group_2', 'beethoven_piano_sonatas', '13-3'),
                             ('group_3', 'beethoven_piano_sonatas', '23-1'),
                             ('group_3', 'ABC', 'n06op18-6_03'),
                             ('group_3', 'medtner_tales', 'op26n01'),
                             ('group_3', 'ABC', 'n13op130_06'),
                             ('group_3', 'liszt_pelerinage', '161.05_Sonetto_104_del_Petrarca'),
                             ('group_3', 'grieg_lyric_pieces', 'op12n07'),
                             ('group_3', 'mozart_piano_sonatas', 'K310-2'),
                             ('group_3', 'corelli', 'op03n12f'),
                             ('group_3', 'corelli', 'op03n10d'),
                             ('group_3', 'corelli', 'op01n02b'),
                             ('group_3', 'corelli', 'op01n07c'),
                             ('group_3', 'ABC', 'n09op59-3_04'),
                             ('group_3', 'beethoven_piano_sonatas', '12-4'),
                             ('group_3', 'ABC', 'n04op18-4_04'),
                             ('group_3', 'corelli', 'op03n06b'),
                             ('group_3', 'dvorak_silhouettes', 'op08n12'),
                             ('group_3', 'corelli', 'op03n06a'),
                             ('group_3', 'chopin_mazurkas', 'BI126-1op41-4'),
                             ('group_3', 'chopin_mazurkas', 'BI134'),
                             ('group_3', 'schumann_kinderszenen', 'n08'),
                             ('group_3', 'corelli', 'op03n09d'),
                             ('group_3', 'tchaikovsky_seasons', 'op37a05'),
                             ('group_3', 'corelli', 'op03n02a'),
                             ('group_3', 'beethoven_piano_sonatas', '03-3'),
                             ('group_3', 'debussy_suite_bergamasque', 'l075-02_suite_menuet'),
                             ('group_3', 'corelli', 'op03n12b'),
                             ('group_3', 'tchaikovsky_seasons', 'op37a10'),
                             ('group_3', 'beethoven_piano_sonatas', '17-1'),
                             ('group_3', 'corelli', 'op01n05a'),
                             ('group_3', 'corelli', 'op04n06c')],
                   'names': ['piece_group', 'corpus', 'piece']}}
grouped_result = grouper.process(result)
grouped_result
count
piece_group corpus piece tpc_name tpc
group_1 ABC n02op18-2_04 A 3 528
A# 10 12
Ab -4 30
B 5 410
Bb -2 152
... ... ... ... ... ...
group_3 tchaikovsky_seasons op37a10 E 4 122
F -1 104
F# 6 28
G 1 123
G# 8 17

1335 rows × 1 columns

grouped_result.plot_grouped()

As promised, the grouped result plots differently: Instead of showing pitch-class distributions for each of the grouped pieces, (which we can still obtain by calling .plot()), it shows the pitch-class distributions for each of the groups. However, for closer inspection, the area of a circle is not ideal, so let’s view it as a bar plot:

grouped_result.make_bar_plot()

Step.process(Data) == Data.apply_step(Step)#

Above, we have applied Steps, an analyzer, a grouper, and a pipeline, to Data objects, namely resources (to the Notes feature and to the Counts result) and to a dataset containing these resources. Another way to achieve the same goal is by applying steps to data. Let’s start with a fresh dataset and apply the grouper and the analyzer once more:

D = dc.Dataset.from_package(package_path)
analyzed_dataset = D.apply_step(grouper, counter)
analyzed_dataset
GroupedAnalyzedDataset
======================
{'inputs': {'basepath': None,
            'packages': {'dcml_corpora': ["'dcml_corpora.measures' (MuseScoreMeasures)",
                                          "'dcml_corpora.notes' (MuseScoreNotes)",
                                          "'dcml_corpora.expanded' (MuseScoreHarmonies)",
                                          "'dcml_corpora.chords' (MuseScoreChords)",
                                          "'dcml_corpora.metadata' (Metadata)"]}},
 'outputs': {'basepath': None,
             'packages': {'results': ["'dcml_corpora.notes.notes.counted' (Counts)"],
                          'features': ["'dcml_corpora.notes.notes' (Notes)"]}},
 'pipeline': ['CustomPieceGrouper', 'FeatureExtractor', 'Counter']}
result = analyzed_dataset.get_result()
result
count
piece_group corpus piece tpc_name tpc
group_1 ABC n02op18-2_04 G 1 784
D 2 728
A 3 528
C 0 416
B 5 410
... ... ... ... ... ...
group_3 tchaikovsky_seasons op37a10 F# 6 28
G# 8 17
B 5 15
D# 9 7
B# 12 2

1335 rows × 1 columns

result.default_groupby
['piece_group']
analyzed_dataset.get_result().make_bar_plot()

Assembling the Pipeline from DimcatConfig objects#

Serialization of any DimcatObject uses the DimcatConfig object. Each config needs to have at least the key dtype, specifying the name of a DimcatObject. Any other keys need to correspond to init arguments of that object. Wrong keys or invalid values are rejected.

Any DimcatObject can be expressed as a config by calling its .to_config() method:

config = counter.to_config()
config
DimcatConfig
============
{'dtype': 'Counter',
 'features': [{'dtype': 'DimcatConfig', 'options': {'dtype': 'Notes'}}],
 'strategy': 'GROUPBY_APPLY',
 'smallest_unit': 'SLICE',
 'dimension_column': 'count'}

Any config can be used to instantiate a DimcatObject:

counter_copy = config.create()
print(f"""The new object and the old object are
equal: {counter == counter_copy}
identical: {counter is counter_copy}""")
The new object and the old object are
equal: True
identical: False

Wherever DiMCAT operates with configs, it also accepts dictionaries:

step_configs = [
    dict(dtype="FeatureExtractor", features=[dict(dtype="Notes", format="FIFTHS")]),
    dict(dtype='CustomPieceGrouper', grouped_units=grouping),
    dict(dtype="Counter")
]
pl = dc.Pipeline.from_step_configs(step_configs)
pl
Pipeline([FeatureExtractor, CustomPieceGrouper, Counter])
resulting_dataset = pl.process(dataset)
resulting_dataset.get_result().make_bar_plot()