The following documents the audio classes and arithmethic operations for
audio data. More details and background is given in the concepts (
audioclasses,
Fouriertransform,
arithmeticoperations).
Objects of this class contain frequency data which is not directly
convertible to the time domain, i.e., non-equidistantly spaced bins or
incomplete spectra.
data (array, double) – Raw data in the frequency domain. The memory layout of Data is ‘C’.
E.g. data of shape=(3,2,1024) has 3 x 2 channels with 1024
frequency bins each. Data can be int, float or complex.
Data of type int is converted to float.
frequencies (array, double) – Frequencies of the data in Hz. The number of frequencies must match
the size of the last dimension of data.
comment (str, optional) – A comment related to the data. The default is 'none'.
Notes
FrequencyData objects do not support an FFT norm, because this requires
knowledge about the sampling rate or the number of samples of the time
signal [1].
The channel shape gives the shape of the audio data excluding the last
dimension, which is n_samples for time domain objects and
n_bins for frequency domain objects.
The number of samples and frequency bins always remains the same, e.g.,
an audio object of cshape=(4,3) and n_samples=512 will have
cshape=(12,) and n_samples=512 after flattening.
newshape (int, tuple) – new cshape of the audio object. One entry of newshape dimension
can be -1. In this case, the value is inferred from the
remaining dimensions.
Objects of this class contain data which is directly convertible between
time and frequency domain (equally spaced samples and frequency bins). The
data is always real valued in the time domain and complex valued in the
frequency domain.
data (ndarray, double) – Raw data of the signal in the time or frequency domain. The memory
layout of data is ‘C’. E.g. data of shape=(3,2,1024) has
3 x 2 channels with 1024 samples or frequency bins each. Time data
is converted to float. Frequency is converted to complex
and must be provided as single sided spectra, i.e., for all
frequencies between 0 Hz and half the sampling rate.
sampling_rate (double) – Sampling rate in Hz
n_samples (int, optional) – Number of samples of the time signal. Required if domain is
'freq'. The default is None, which assumes an even number
of samples if the data is provided in the frequency domain.
domain ('time', 'freq', optional) – Domain of data. The default is 'time'
fft_norm (str, optional) – The normalization of the Discrete Fourier Transform (DFT). Can be
'none', 'unitary', 'amplitude', 'rms', 'power',
or 'psd'. See normalization and [2]
for more information. The default is 'none', which is typically
used for energy signals, such as impulse responses.
comment (str) – A comment related to data. The default is None.
The channel shape gives the shape of the audio data excluding the last
dimension, which is n_samples for time domain objects and
n_bins for frequency domain objects.
The number of samples and frequency bins always remains the same, e.g.,
an audio object of cshape=(4,3) and n_samples=512 will have
cshape=(12,) and n_samples=512 after flattening.
The normalized data is usually used for inspecting the data, e.g.,
using plots or when extracting information such as the amplitude of
harmonic components. Most processing operations, e.g., frequency
domain convolution, require the non-normalized data stored as
freq_raw.
Return the frequency domain data without normalization.
Most processing operations, e.g., frequency
domain convolution, require the non-normalized data.
The normalized data stored as freq is usually used for inspecting
the data, e.g., using plots or when extracting information such as the
amplitude of harmonic components.
newshape (int, tuple) – new cshape of the audio object. One entry of newshape dimension
can be -1. In this case, the value is inferred from the
remaining dimensions.
data (array, double) – Raw data in the time domain. The memory layout of data is ‘C’.
E.g. data of shape=(3,2,1024) has 3 x 2 channels with
1024 samples each. The data can be int or float and is
converted to float in any case.
times (array, double) – Times in seconds at which the data is sampled. The number of times
must match the size of the last dimension of data.
comment (str, optional) – A comment related to data. The default is 'none'.
The channel shape gives the shape of the audio data excluding the last
dimension, which is n_samples for time domain objects and
n_bins for frequency domain objects.
The number of samples and frequency bins always remains the same, e.g.,
an audio object of cshape=(4,3) and n_samples=512 will have
cshape=(12,) and n_samples=512 after flattening.
newshape (int, tuple) – new cshape of the audio object. One entry of newshape dimension
can be -1. In this case, the value is inferred from the
remaining dimensions.
data (tuple of the form (data_1,data_2,...,data_N)) – Data to be added. Can contain pyfar audio objects, array likes, and
scalars. Pyfar audio objects can not be mixed, e.g.,
TimeData and FrequencyData objects do not work
together. See below or
arithmeticoperations
for possible combinations of Signal FFT normalizations.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (see pyfar.dsp.fft.normalization). The default is
'freq'.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.
data (tuple of the form (data_1, data_2, ..., data_N)) – Data to be divided. Can contain pyfar audio objects, array likes, and
scalars. Pyfar audio objects can not be mixed, e.g.,
TimeData and FrequencyData objects do not work
together. See below or
arithmeticoperations
for possible combinations of Signal FFT normalizations.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (See pyfar.dsp.fft.normalization). The default is
'freq'.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.
Matrix multiplication of multidimensional pyfar audio objects and/or
array likes.
The multiplication is based on numpy.matmul and acts on the channels
of audio objects (Signal, TimeData, and
FrequencyData). Alternatively, the @ operator can be used
for frequency domain matrix multiplications with the default parameters.
Parameters:
data (tuple of the form (data_1, data_2, ..., data_N)) – Data to be multiplied. Can contain pyfar audio objects and array likes.
If multiple audio objects are passed they must be of the same type and
their FFT normalizations must allow the multiplication (see
arithmeticoperations
and notes below).
If audio objects and arrays are included, the arrays’ shape need
to match the audio objects’ cshape (not the shape of the underlying
time or frequency data). More Information on the requirements regarding
the shapes and cshapes and their handling is given in the notes below.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (see pyfar.dsp.fft.normalization). The default is
'freq'.
axes (list of 3 tuples) –
Each tuple in the list specifies two axes to define the matrices for
multiplication. The default [(-2,-1),(-2,-1),(-2,-1)] uses the
last two axes of the input to define the matrices (first and
second tuple) and writes the result to the last two axes of the output
data (third tuple).
In case of pyfar audio objects, the indices refer to the channel
dimensions and ignore the last dimension of the underlying data that
contains the samples or frequency bins (see
audioclasses for more
information). For example, a signal with 4 times 2 channels and 120
frequency bins has a cshape of (4,2), while the shape of the
underlying frequency data is (4,2,120). The default tuple
(-2,-1) would result in 120 matrices of shape (4,2) used
for the multiplication and not 4 matrices of shape (2,120).
If data contains more than two operands, the scheme given by axes
refers to all of the sequential multiplications.
See notes and examples for more details.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.
Matrix muliplitcation of arrays including a time of frequency dependent
dimension is possible by first converting these audio objects
(Signal, TimeData, FrequencyData).
See example below.
Audio objects with a one dimensional cshape are expanded to allow matrix
multiplication:
If the first signal is 1-D, it is expanded to 2-D by prepending a
dimension. For example a cshape of (10,) becomes (1,10).
If the second signal is 1-D, it is expanded to 2-D by appending a
dimension. For example a cshape of (10,) becomes (10,1)
The shapes of array likes and cshapes of audio objects must be
broadcastable except for the axes specified by the axes parameter.
The fft_norm of the result is as follows
If one signal has the FFT normalization 'none', the results gets
the normalization of the other signal.
If both signals have the same FFT normalization, the results gets the
same normalization.
data (tuple of the form (data_1, data_2, ..., data_N)) – Data to be multiplied. Can contain pyfar audio objects, array likes,
and scalars. Pyfar audio objects can not be mixed, e.g.,
TimeData and FrequencyData objects do not work
together. See below or
arithmeticoperations
for possible combinations of Signal FFT normalizations.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (See pyfar.dsp.fft.normalization). The default is
'freq'.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.
data (tuple of the form (data_1, data_2, ..., data_N)) – The base for which the power is calculated. Can contain pyfar audio
objects, array likes, and scalars. Pyfar audio objects can not be
mixed, e.g., TimeData and FrequencyData objects
do not work together. See below or
arithmeticoperations
for possible combinations of Signal FFT normalizations.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (See pyfar.dsp.fft.normalization). The default is
'freq'.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.
data (tuple of the form (data_1, data_2, ..., data_N)) – Data to be subtracted. Can contain pyfar audio objects, array likes,
and scalars. Pyfar audio objects can not be mixed, e.g.,
TimeData and FrequencyData objects do not work
together. See below or
arithmeticoperations
for possible combinations of Signal FFT normalizations.
domain ('time', 'freq', optional) – Flag to indicate if the operation should be performed in the time or
frequency domain. Frequency domain operations work on the raw
spectrum (See pyfar.dsp.fft.normalization). The default is
'freq'.
Returns:
results – Result of the operation as numpy array, if data contains only array
likes and numbers. Result as pyfar audio object if data contains an
audio object.