The wavelet transform is a signal processing tool applicable to both images and time series. The figure below illustrates an example of its application in image processing, where it decomposes an image into multiple frequency channels to achieve signal decomposition and compression. For instance, JPEG 2000 is an image compression standard based on the wavelet transform (https://zh.wikipedia.org/wiki/JPEG_2000):

Wavelet Transform Applied to Image Processing

Wavelet Transform Applied to Image Processing

In recent years, wavelet transform has become a powerful helper for many deep learning algorithms. Many papers utilize wavelet transform as an important component of feature engineering, such as: wavelet tokenization empowering time series foundation models (WaveToken, ICML 2025), wavelet + Mamba / Diffusion for multi-scale efficient modeling (Wave-Mamba, DiMSUM, WaveDiff), and learnable wavelet layers for end-to-end filter parameter optimization (AdaWaveNet, DeSpaWN). The core advantage lies in parameter efficiency from multi-resolution inductive bias—achieving better performance than traditional methods with fewer parameters and less computation, widely applied to time series forecasting, image generation, speech synthesis, and medical signal analysis.

Comparison Between Wavelet Transform and Fourier Transform

As a signal processing tool, we must inevitably mention another common tool: the Fourier transform. First, let me explain the similarities and differences between Fourier transform and wavelet transform. Imagine we’re observing a traffic light. We know that a traffic light has three colors (red, yellow, green) and changes according to a certain pattern:

Figure 1: Traffic Light Example

Figure 1: Traffic Light Example

Suppose this traffic light changes from red to yellow to green over 10 seconds. Using the Fourier transform, we would obtain three frequency values (corresponding to the three colors). However, since the Fourier transform has no explicit time resolution, we can only know that the three colors appeared within this time period, but we cannot determine the order and timing of their appearance.

In contrast, from the wavelet transform result in Figure 1 (at the bottom), with the vertical axis representing frequency and the horizontal axis representing the time when each frequency appears, we can determine that red appears first, then yellow, and finally green. Why does this difference occur? The following will explain it from theory to application.

Orthogonal Basis and Dot Product

The essence of signal processing is basis transformation. The Fourier transform projects a signal onto a set of orthogonal complex exponential basis functions: cos + i*sin. By calculating the inner product of the signal with basis functions of different frequencies, we can obtain the component weights of the signal at various frequencies, thereby analyzing the spectrum characteristics of the signal:

Figure 2: Orthogonal Basis at Different Frequencies ω

Figure 2: Orthogonal Basis at Different Frequencies ω

The larger the inner product, the more similar the original signal is to the basis function. However, once ω is fixed, the orthogonal basis of the Fourier transform has uniform frequency everywhere, meaning it has no locality at all. Although this may sound bad due to sacrificing time resolution, it provides a more accurate estimate of each frequency because it uses all the data to estimate one frequency. We can say the choice between Fourier transform and wavelet transform is a trade-off. If a signal has length L, the Fourier transform calculates inner products from the highest frequency 1/2 (at most 2 times the sampling rate 1) to the lowest frequency 1/L (fundamental frequency) to obtain each frequency component. After calculation, we get a frequency distribution graph as shown in the middle of Figure 1.

Mother Wavelet and Father Wavelet

On the other hand, wavelet transform uses a series of functions that are only locally non-zero to compute inner products with the original signal. We call this locally non-zero function the mother wavelet. Scholars have defined various shapes of mother wavelets to extract different types of information. Here are some examples:

Figure 3: Various Mother Wavelets

Figure 3: Various Mother Wavelets

After selecting the mother wavelet, we need to scan the signal with it:

Figure 4: Scanning Signal with Wavelet

Figure 4: Scanning Signal with Wavelet

It’s worth noting that during the scanning process, the signal frequency that a wavelet can extract is fixed because the frequency of the wavelet itself doesn’t change. We can understand the above figure as computing the inner product of the original signal with wavelets of the same frequency at different positions along the signal. This is why wavelets have locality. Next, to extract more than one frequency, we need to use a scaling function (father wavelet) to stretch or compress the wavelet:

Figure 5: Scaling the Wavelet

Figure 5:Scaling the Wavelet

Finally, by combining wavelet scanning and scaling, we obtain the components at various frequencies at different positions.

Figure 6: Multi-scale Wavelet Components

Figure 6: Multi-scale Wavelet Components

Thus, we can obtain the wavelet transform scale diagram (Scalogram) as shown below:

Figure 7: Wavelet Scalogram

Figure 7: Wavelet Scalogram

Therefore, wavelet transform is a good tool for multi-resolution analysis (MRA). This is also why wavelets frequently appear in machine learning image algorithms, as both the approximation and high-frequency details of an image play a decisive role in feature extraction.

Wavelet Transform Feature Extraction

In machine learning, we generally use the Discrete Wavelet Transform (DWT) because the Continuous Wavelet Transform (CWT) is redundant and can easily cause machine learning algorithms to learn noise. The figure below is an example of the application of discrete wavelet transform in time series modeling:

Figure 8: Discrete Wavelet Transform Example

Figure 8: Discrete Wavelet Transform Example

First, we perform a 3-level discrete wavelet transform on the original signal. The original signal passes through a high-pass filter g and a low-pass filter h (this fast discrete wavelet transform was invented by mathematician Stéphane Mallat, also called the Mallat algorithm).

The result of the low-pass filter is further decomposed by the same pair of high-pass and low-pass filters. Each layer of wavelet transform downsamples the signal to achieve multi-resolution analysis. We can also see that the coefficients from each layer are connected to the machine learning algorithm, rather than using only the last layer’s coefficients, which can be seen as a form of skip connection. It’s worth noting that we can either choose predefined wavelet families from Figure 3, or let the machine learning algorithm learn the wavelet family itself.

A standard wavelet family must satisfy a series of conditions, such as perfect reconstruction, orthogonality, discreteness, and uniform energy. We can incorporate these conditions into the loss function of learnable wavelets. After training the wavelets, we freeze their weights and continue training downstream tasks, which can be seen as two-stage training.

Figure 9: Learnable Wavelet Example

Figure 9: Learnable Wavelet Example

It’s worth mentioning that wavelet training can be simplified to only training the high-pass filter, while the low-pass filter can be calculated using the Quadrature Mirror Filter formula. It’s not actually a sophisticated algorithm; it simply means lowering the frequency of the high-pass filter to directly obtain the low-pass filter, and finally ensuring their frequencies don’t overlap:

Figure 10: Quadrature Mirror Filter

Figure 10: Quadrature Mirror Filter

As shown in Figure 10, the frequency response of the db8 wavelet (invented by mathematician Ingrid Daubechies) is almost non-overlapping (the solid green and yellow lines), while the blue and red learnable wavelets have relatively poor frequency response (larger overlap regions). However, whether feature extraction performs well ultimately depends on comparing the performance of the downstream task.

Program Implementation

In Python, PyWavelets (https://pywavelets.readthedocs.io/en/latest/install.html) provides a complete wavelet transform library. Of course, if you’re using learnable wavelets, you can use the following methods for computation:

  1. First, generate the analysis matrix (N×N) you need based on the signal length N (a matrix containing function values of wavelet filter banks at different positions):
  2. Multiply the original signal (1×N) with this analysis matrix
  3. Downsample the resulting (1×N) vector until you reach your desired number of levels
  4. Perform inverse wavelet transform using the obtained coefficients multiplied by the transpose of the analysis matrix (because the inverse of a matrix composed of orthogonal basis is exactly its own transpose, which is one of the benefits of choosing orthogonal bases), then upsample until the length matches the original data.
  5. Calculate the reconstruction error between the signal obtained from the inverse transform and the original signal, then backpropagate.

Below, let’s see what an analysis matrix of length N=32 looks like:

Figure 11: Analysis Matrix Example

Figure 11: Analysis Matrix Example

We can see that wavelet transform is essentially a CNN kernel performing translation on the signal. The analysis matrix instantiates the wavelet at each different position. We can also use CNN convolution to replace matrix multiplication. Below, let’s look at the reconstruction results using coefficients from each layer of the wavelet transform:

Figure 12: Reconstruction Results

Figure 12: Reconstruction Results

We can see that the wavelet decomposes the signal into one Approximation and 3 details. The Approximation looks almost identical to the original signal, except it’s missing some abrupt changes and is smoother. If we upsample this low-frequency approximation and sum it with the high-frequency coefficients, we can reconstruct the original signal (as shown in the last row of Figure 12). We can also see that the high-frequency signal extracted by the wavelet can reflect the position where the signal appears. If we only used the Fourier transform, we would know nothing about this position because we only know what frequencies it contains, as shown below:

Figure 13: Position Information from Wavelet

Figure 13: Position Information from Wavelet

Bonus:

  1. The results of wavelet transform usually require some feature engineering to be effective.
  2. Learnable wavelets for feature extraction are not necessarily better than custom wavelets.
  3. Wavelet transform is, to some extent, CNN + regularization.

Welcome to discuss wavelet transform and related theory with me!

Classic Literature on Signal Processing and Wavelet Analysis

Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, 2008 (3rd Edition).

Vaidyanathan, P. P. Multirate Systems and Filter Banks. Prentice Hall, 1993.

Daubechies, I. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics (SIAM), 1992.