Non-Intrusive Speech Quality Assessment (NISQA v2.0)¶

Module Interface¶

class torchmetrics.audio.nisqa.NonIntrusiveSpeechQualityAssessment(fs, **kwargs)[source]¶

Non-Intrusive Speech Quality Assessment (NISQA v2.0) [1], [2].

As input to forward and update the metric accepts the following input

preds (Tensor): float tensor with shape (...,time)

As output of forward and compute the metric returns the following output

nisqa (Tensor): float tensor reduced across the batch with shape (5,) corresponding to overall MOS, noisiness, discontinuity, coloration and loudness in that order

Hint

Using this metric requires you to have librosa and requests installed. Install as pip install librosa requests.

Caution

The forward and compute methods in this class return values reduced across the batch. To obtain values for each sample, you may use the functional counterpart non_intrusive_speech_quality_assessment().

Parameters:: fs¶ (int) – sampling frequency of input
Raises:: ModuleNotFoundError – If librosa or requests are not installed

Example

>>> import torch
>>> from torchmetrics.audio import NonIntrusiveSpeechQualityAssessment
>>> _ = torch.manual_seed(42)
>>> preds = torch.randn(16000)
>>> nisqa = NonIntrusiveSpeechQualityAssessment(16000)
>>> nisqa(preds)
tensor([1.0433, 1.9545, 2.6087, 1.3460, 1.7117])

References

[1] G. Mittag and S. Möller, “Non-intrusive speech quality assessment for super-wideband speech communication networks”, in Proc. ICASSP, 2019.
[2] G. Mittag, B. Naderi, A. Chehadi and S. Möller, “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets”, in Proc. INTERSPEECH, 2021.

plot(val=None, ax=None)[source]¶

Plot a single or multiple values from the metric.

Parameters:

val¶ (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax¶ (Optional[Axes]) – A matplotlib axis object. If provided will add plot to that axis

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> import torch
>>> from torchmetrics.audio import NonIntrusiveSpeechQualityAssessment
>>> metric = NonIntrusiveSpeechQualityAssessment(16000)
>>> metric.update(torch.randn(16000))
>>> fig_, ax_ = metric.plot()

../_images/non_intrusive_speech_quality_assessment-1.png

>>> # Example plotting multiple values
>>> import torch
>>> from torchmetrics.audio import NonIntrusiveSpeechQualityAssessment
>>> metric = NonIntrusiveSpeechQualityAssessment(16000)
>>> values = []
>>> for _ in range(10):
...     values.append(metric(torch.randn(16000)))
>>> fig_, ax_ = metric.plot(values)

../_images/non_intrusive_speech_quality_assessment-2.png

Functional Interface¶

torchmetrics.functional.audio.nisqa.non_intrusive_speech_quality_assessment(preds, fs)[source]¶

Non-Intrusive Speech Quality Assessment (NISQA v2.0) [1], [2].

Hint

Usingsing this metric requires you to have librosa and requests installed. Install as pip install librosa requests.

Parameters:

preds¶ (Tensor) – float tensor with shape (...,time)
fs¶ (int) – sampling frequency of input

Return type:

Tensor

Returns:

Float tensor with shape (...,5) corresponding to overall MOS, noisiness, discontinuity, coloration and loudness in that order

Raises:

ModuleNotFoundError – If librosa or requests are not installed
RuntimeError – If the input is too short, causing the number of mel spectrogram windows to be zero
RuntimeError – If the input is too long, causing the number of mel spectrogram windows to exceed the maximum allowed

Example

>>> import torch
>>> from torchmetrics.functional.audio.nisqa import non_intrusive_speech_quality_assessment
>>> _ = torch.manual_seed(42)
>>> preds = torch.randn(16000)
>>> non_intrusive_speech_quality_assessment(preds, 16000)
tensor([1.0433, 1.9545, 2.6087, 1.3460, 1.7117])

References

[1] G. Mittag and S. Möller, “Non-intrusive speech quality assessment for super-wideband speech communication networks”, in Proc. ICASSP, 2019.
[2] G. Mittag, B. Naderi, A. Chehadi and S. Möller, “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets”, in Proc. INTERSPEECH, 2021.