Speech-to-Reverberation Modulation Energy Ratio (SRMR)¶
Module Interface¶
- class torchmetrics.audio.srmr.SpeechReverberationModulationEnergyRatio(fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False, **kwargs)[source]¶
Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).
SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.
As input to
forwardandupdatethe metric accepts the following inputpreds(Tensor): float tensor with shape(...,time)
As output of forward and compute the metric returns the following output
srmr(Tensor): float scaler tensor
Hint
Using this metrics requires you to have
gammatoneandtorchaudioinstalled. Either install aspip install torchmetrics[audio]orpip install torchaudioandpip install git+https://github.com/detly/gammatone.Attention
This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a)
fast=False, norm=False, max_cf=128, b)fast=False, norm=True, max_cf=30, have a relatively small inconsistency.- Parameters:
n_cochlear_filters¶ (
int) – Number of filters in the acoustic filterbanklow_freq¶ (
float) – determines the frequency cutoff for the corresponding gammatone filterbank.min_cf¶ (
float) – Center frequency in Hz of the first modulation filter.max_cf¶ (
Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.fast¶ (
bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.
- Raises:
ModuleNotFoundError – If
gammatoneortorchaudiopackage is not installed
Example
>>> from torch import randn >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> preds = randn(8000) >>> srmr = SpeechReverberationModulationEnergyRatio(8000) >>> srmr(preds) tensor(0.3191)
- plot(val=None, ax=None)[source]¶
Plot a single or multiple values from the metric.
- Parameters:
val¶ (
Union[Tensor,Sequence[Tensor],None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.ax¶ (
Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis
- Return type:
- Returns:
Figure and Axes object
- Raises:
ModuleNotFoundError – If matplotlib is not installed
>>> # Example plotting a single value >>> import torch >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> metric = SpeechReverberationModulationEnergyRatio(8000) >>> metric.update(torch.rand(8000)) >>> fig_, ax_ = metric.plot()
>>> # Example plotting multiple values >>> import torch >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> metric = SpeechReverberationModulationEnergyRatio(8000) >>> values = [ ] >>> for _ in range(10): ... values.append(metric(torch.rand(8000))) >>> fig_, ax_ = metric.plot(values)
Functional Interface¶
- torchmetrics.functional.audio.srmr.speech_reverberation_modulation_energy_ratio(preds, fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False)[source]¶
Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).
SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.
- Parameters:
n_cochlear_filters¶ (
int) – Number of filters in the acoustic filterbanklow_freq¶ (
float) – determines the frequency cutoff for the corresponding gammatone filterbank.min_cf¶ (
float) – Center frequency in Hz of the first modulation filter.max_cf¶ (
Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.fast¶ (
bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.
Hint
Usingsing this metrics requires you to have
gammatoneandtorchaudioinstalled. Either install aspip install torchmetrics[audio]orpip install torchaudioandpip install git+https://github.com/detly/gammatone.Attention
This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a)
fast=False, norm=False, max_cf=128, b)fast=False, norm=True, max_cf=30, have a relatively small inconsistency.- Return type:
- Returns:
Scalar tensor with srmr value with shape
(...)- Raises:
ModuleNotFoundError – If
gammatoneortorchaudiopackage is not installed
Example
>>> from torch import randn >>> from torchmetrics.functional.audio import speech_reverberation_modulation_energy_ratio >>> preds = randn(8000) >>> speech_reverberation_modulation_energy_ratio(preds, 8000) tensor([0.3191], dtype=torch.float64)