Data augmentation¶
Data augmentation is a technique to artificially increases the size of a dataset by applying various transformations on to the existing data. These transformations consist in altering one or several attributes of the original data. In the context of images, they can include operations such as rotation, scaling, cropping or color adjustments. This is more tricky in the case of natural language, where the meaning of the sentences can easily diverge following how the text is modified, but some techniques such as paraphrase generation or back translation can fill this purpose.
The purpose of data augmentation is to introduce variability and diversity into the training data without collecting additional real-world data. Data augmentation can be important and increase a model’s learning and generalization, as it exposes it to a wider range of variations and patterns present in the data. In turn it can increases its robustness and decrease overfitting.
MidiTok allows to perform data augmentation, on the MIDI level and token level. Transformations can be made by increasing the values of the velocities and durations of notes, or by shifting their pitches by octaves. Data augmentation is highly recommended to train a model, in order to help a model to learn the global and local harmony of music. In large datasets such as the Lakh or Meta MIDI datasets, MIDI files can have various ranges of velocity, duration values, and pitch. By augmenting the data, thus creating more diversified data samples, a model can better generalize learning the melody, harmony and music features rather than learning specific recurrent token successions.
Data augmentation module.
The module implements three public methods:
miditok.data_augmentation.augment_midi(): augment a unique midi on a uniqueset of offsets;
miditok.data_augmentation.augment_midi_multiple_offsets(): augment a uniqueMIDI on combinations of offsets;
miditok.data_augmentation.augment_midi_dataset(): augment a list of MIDIfiles on combinations of offsets.
- miditok.data_augmentation.augment_dataset(data_path: Path | str, pitch_offsets: list[int] | None = None, velocity_offsets: list[int] | None = None, duration_offsets: list[int] | None = None, all_offset_combinations: bool = False, restrict_on_program_tessitura: bool = True, velocity_range: tuple[int, int] = (1, 127), duration_in_ticks: bool = False, min_duration: int | float = 0.03125, out_path: Path | str | None = None, copy_original_in_new_location: bool = True, save_data_aug_report: bool = True) None¶
Perform data augmentation on a dataset of music files.
The new created files have names in two parts, separated with a “#” character. Make sure your files do not have ‘§’ in their names if you intend to reuse the information of the second part in some script. Drum tracks are not augmented..
- Parameters:
data_path – root path to the folder containing tokenized json files.
pitch_offsets – list of pitch offsets for augmentation. (default:
None)velocity_offsets – list of velocity offsets for augmentation. If you plan to tokenize these files, the velocity offsets should be chosen accordingly to the number of velocities in your tokenizer’s vocabulary (
num_velocities). (default:None)duration_offsets – list of duration offsets for augmentation, to be given either in beats if
duration_in_ticksisFalse, in ticks otherwise. (default:None)all_offset_combinations – will perform data augmentation on all the possible combinations of offset values. If set to
False, the method will only augment on the offsets separately without combining them.restrict_on_program_tessitura – if
True, the method will consider the recommended pitch values of each instrument/program as the range of possible values after augmentation. Otherwise, the(0, 127)range will be used. (default:True)velocity_range – minimum and maximum velocity values. (default:
(1, 127))duration_in_ticks – if given
True, theduration_offsetargument will be considered as expressed in ticks. Otherwise, it is considered in beats, and the equivalent in ticks will be determined by multiplying it by the MIDI’s time division (480 by default for abc files). (default: False)min_duration – minimum duration limit to apply if
duration_offsetis negative. Ifduration_in_ticksisTrue, it must be given in ticks, otherwise in beats as a float or integer. (default: 0.03125)out_path – output path to save the augmented files. Original (non-augmented) files will be saved to this location. If none is given, they will be saved in the same location as the data_path. (default: None)
copy_original_in_new_location – if given True, the original (non-augmented) files will be saved in the out_path location too. (default: True)
save_data_aug_report – will save numbers from the data augmentation in a
data_augmentation_report.txtfile in the output directory. (default: True)
- miditok.data_augmentation.augment_score(score: ScoreFactory(), pitch_offset: int = 0, velocity_offset: int = 0, duration_offset: int | float = 0, velocity_range: tuple[int, int] = (1, 127), duration_in_ticks: bool = False, min_duration: int | float = 0.03125, augment_copy: bool = True)¶
Augment a Score object by shifting its pitch, velocity and/or duration values.
Velocity and duration values will be clipped according to the
velocity_rangeandmin_durationarguments. Drum tracks are only augmented on the velocity. If you are using a pitch offset, make sure the files doesn’t contain notes with pitches that would end outside the conventional(0, 127)range, the method will otherwise crash.- Parameters:
score –
symusic.Scoreobject to augment.pitch_offset – pitch offset for augmentation. (default:
0)velocity_offset – velocity offset for augmentation. If you plan to tokenize this file, the velocity offset should be chosen accordingly to the number of velocities in your tokenizer’s vocabulary (
num_velocities). (default:0)duration_offset – duration offset for augmentation, to be given either in beats if
duration_in_ticksisFalse, in ticks otherwise. (default:0)velocity_range – minimum and maximum velocity values. (default:
(1, 127))duration_in_ticks – if given
True, theduration_offsetargument will be considered as expressed in ticks. Otherwise, it is considered in beats, and the equivalent in ticks will be determined by multiplying it by the MIDI’s time division (480 by default for abc files). (default:False)min_duration – minimum duration limit to apply if
duration_offsetis negative. Ifduration_in_ticksisTrue, it must be given in ticks, otherwise in beats as a float or integer. (default:0.03125)augment_copy – if given True, a copy of the input
symusic.Scoreobject is augmented and returned. If False, the inputsymusic.Scoreobject is modified in-place. (default:True)
- Returns:
the augmented
symusic.Scoreobject.
- miditok.data_augmentation.augment_score_multiple_offsets(score: ScoreFactory(), pitch_offsets: list[int] | None = None, velocity_offsets: list[int] | None = None, duration_offsets: list[int] | None = None, all_offset_combinations: bool = False, restrict_on_program_tessitura: bool = True, velocity_range: tuple[int, int] = (1, 127), duration_in_ticks: bool = False, min_duration: int | float = 0.03125) list[tuple[tuple[int, int, int], ScoreFactory()]]¶
Perform data augmentation on a
symusic.Scoreobject with multiple offset values.Velocity and duration values will be clipped according to the
velocity_rangeandmin_durationarguments. Drum tracks are only augmented on the velocity.- Parameters:
score –
symusic.Scoreobject to augment.pitch_offsets – list of pitch offsets for augmentation.
velocity_offsets – list of velocity offsets for augmentation. If you plan to tokenize this file, the velocity offsets should be chosen accordingly to the number of velocities in your tokenizer’s vocabulary (
num_velocities). (default:None)duration_offsets – list of duration offsets for augmentation, to be given either in beats if
duration_in_ticksisFalse, in ticks otherwise. (default:None)all_offset_combinations – will perform data augmentation on all the possible combinations of offset values. If set to
False, the method will only augment on the offsets separately without combining them. (default:None)restrict_on_program_tessitura – if
True, the method will consider the recommended pitch values of each instrument/program as the range of possible values after augmentation. Otherwise, the(0, 127)range will be used. (default:True)velocity_range – minimum and maximum velocity values. (default:
(1, 127))duration_in_ticks – if given
True, theduration_offsetargument will be considered as expressed in ticks. Otherwise, it is considered in beats, and the equivalent in ticks will be determined by multiplying it by the MIDI’s time division (480 by default for abc files). (default: False)min_duration – minimum duration limit to apply if
duration_offsetis negative. Ifduration_in_ticksisTrue, it must be given in ticks, otherwise in beats as a float or integer. (default: 0.03125)
- Returns:
augmented
symusic.Scoreobjects.