# Design of Polyphase Cascaded Integrator Comb Interpolator Filter for Ultrasonic Phased Array Time Delays

Wenming Tang, Guixiong Liu, and Ruobo Lin

Abstract-Time-delay accuracy plays a crucial role in beamforming algorithms of ultrasonic phased array. This paper presents a novel design for a polyphase cascaded integrator comb (CIC) interpolator filter, which can achieve high time-delay accuracy for an ultrasonic phased array focusing-delay system, and the beamforming principle and time-delay algorithm implementation are studied in detail. Firstly, a novel design for the CIC filter is proposed, which improves traditional designs (The data rate increases with the increase of interpolation coefficient, when the direct interpolation mode is used) by using polyphase interpolation. Secondly, according to the principle of interpolating and filtering, CIC polyphase interpolation formulas are deduced by using the exhaustive method, and the formulas can simultaneously implement interpolation and polyphase decomposition with the data rate of each phase same as the original rate. Finally, the algorithm feasibility is validated by simulation analysis and field programmable gate array implementation. Experimental results reveal that a time-delay accuracy of 1 ns at a 100 MHz sampling frequency can be achieved using the improved 10-fold polyphase interpolation CIC filter, and the maximum relative error of the time-delay accuracy is typically only -0.08% for 5 MHz ultrasonic echo waves. This design can be used for many important practical applications for phased array instruments with excellent performance.

*Index Terms*—Delay accuracy, cascaded integrator comb filter, polyphase interpolation, field programmable gate array.

## I. INTRODUCTION

Ultrasonic phased array technology is one of important non-destructive testing techniques, the main feature of which is computer-controlled excitation (amplitude and time-delay) of individual elements by a multi-element probe. An ultrasonic focused beam with the option can be generated by the excitation of piezocomposite elements to modify beam parameters such as angle, focal distance, and focal spot size through software. The sweeping beam is focused and can scan at high speed in specular mode to detect misoriented cracks. For these reasons, ultrasonic phased array technology has received widespread attention in the field of ultrasonic non-destructive testing.

To achieve a precise focusing time-delay for ultrasonic phased array dynamic depth focusing [1], [2], many accurate time-delay methods have been proposed such as hardware wire delays [3], [4], sampling delays [5], [6], rely on Application Specific Integrated Circuit (ASIC) delays [7], [8], software algorithm delays [9]-[12] and so on. However,

among these methods, the hardware line delays, sampling delays, and rely on ASIC delays often have some disadvantages such as poor universality, high cost, and difficulty in controllable modification. The software time-delay method is a promising alternative because it uses flexible digital signal processing algorithms to realize an accurate time-delay, so good versatility and portability. However, the software time-delay method strongly depends on the quality of the algorithm, so numerous attempts have been made to improve the performance of the software methods, such as increasing the time-delay accuracy and enabling real-time processing.

The field programmable gate array (FPGA), which can offer fast real-time processing in parallel, mainly consists of the AND and OR gates, and thus FPGA is very suitable for implementation of a CIC filter containing shifters and adders. This paper presents a novel method to design a polyphase interpolator, which uses the CIC filter [13], [14] to control the time-delay accuracy of the ultrasonic phased array technology. This study gives a thorough analysis of the feedback structure of the CIC filter and provides a comprehensive explanation of the polyphase interpolation principle. The proposed algorithm, including both the interpolation and the polyphase decomposition, is implemented using an FPGA to achieve real-time processing in parallel and high time-delay accuracy for processing ultrasonic echo signals. The implementation results show that the improved 10-fold polyphase interpolation CIC filter can give time-delay accuracy of 1 ns at a 100 MHz sampling frequency, and only -0.08% of the maximum relative error of the time-delay accuracy is typically obtained for 5 MHz ultrasonic echo waves.

According to the structure of the CIC filter, CIC polyphase interpolation formulas are deduced by using the exhaustive method, which can combine the interpolator with the integrator, and realize the signal upsampling. The use of the polyphase interpolation and decomposition technology makes the data rate unchanged before and after the interpolation, and realizes the high time-delay accuracy. Therefore, the CIC filter dramatically reduces the load of FPGA computing, i.e., the CIC filter achieves time-delay  $1/(R \times f_s)$  s by the *R*-fold interpolation CIC filtering technology when a clock frequency of  $f_s$  Hz is used, where R and  $f_s$ represent interpolation coefficient and sampling frequency, respectively. Hence, with the characteristics of the polyphase interpolation CIC filter, the filtering process only consists of some elementary operations, less resource utilization, less latency, high flexibility, strong ability to anti-imaging spectrum for interpolation, and easy to be implemented by FPGA. The characteristics make the CIC filter highly suited for real-time and high-precision time-delay algorithm

Manuscript received June 12, 2016; revised October 27, 2016.

The authors are with the School of Mechanical & Automotive Engineering, South China University of Technology, Guangzhou, China (e-mail: megxliu@scut.edu.cn, twm316@163.com).

implementation.

## II. THEORY

## A. Design of a Novel Polyphase CIC Interpolation Filter

A CIC filter is a special class of finite impulse response (FIR) filters mainly consisting of comb and integrator sections, which can operate at high speed since the hardware implementation only requires adders, shifters and registers (no multipliers). Meanwhile, the CIC filter can conduct integer multiple interpolation and decimation filtering when it serves as a low-pass filter, which makes it suitable as an interpolation filter for high accurate time-delay applications [15]. Typically, an ultrasonic phased array system with a clock rate of  $f_s$  Hz, the sampling rate of  $R \times f_s$  Hz should be required by *R*-fold interpolation and can achieve a time-delay focusing accuracy of  $1/(R \times f_s)$  s. Polyphase decomposition can be used to decompose a high rate ( $R \times f_s$  Hz) signal into R low-rate ( $f_s$  Hz) signals to achieve the delay precision. Moreover, the entire time-delay algorithm can be implemented very efficiently on an FPGA. Therefore, a polyphase interpolation CIC filter can provide an accurate time-delay for an ultrasonic phased array beam, as shown in Fig. 1. The time-delay model consists of coarse delay and fine delays. The coarse delay which has  $1/f_s$  time resolution is easy to achieve by shifter or accessing a proper memory address in a FPGA, while fine delay selects the phase of polyphase interpolation CIC filter. The paper will only study on fine delay model.



Fig. 1. Diagram of the delay principle of an ultrasonic phased array beam based on the novel CIC filter.

Formula (1) is a mathematical expression of an *N*-order CIC filter, which is also equivalent to a combination of comb and integrator sections [16]. Fig. 2 is a block diagram of a typical *R*-fold interpolation CIC filter comprising comb, interpolator and integrator sections. According to the principle of the CIC filter, the signal rate will drastically increase after it passes through the interpolator, which means that the follow-up system (here is the integrator) should be capable of working with high signal rates. For instance, after 10-fold interpolation, the signal rate of  $f_s = 100$  MHz will be increased up to  $10 \times f_s = 1$  GHz, and hence the direct interpolation method makes the FPGA unfeasible for processing 1 GHz clock rate signals. We design a novel method to combine the interpolator and polyphase decomposition

at the same time, enabling the entire system to still operate at a  $f_s = 100$  MHz clock rate.

$$H(z) = \left[\sum_{k=0}^{R^*M-1} z^{-k}\right]^N \text{ or } H(z) = \frac{(1-z^{-R^*M})^N}{(1-z^{-1})^N}$$
(1)

where N and R are the order of the CIC filter and the interpolation factor, respectively, and M is the delay factor of the comb filtering portion, which is generally taken to be 1.



Fig. 2. Block diagram of the typical R-fold interpolation CIC filter.

The reciprocal functions of the interpolator and the integrator are analyzed as follows. As shown in Fig. 2, given R, assume M = 1 and N = 5. Set the input digital sequence as  $X_{in} = \cdots$ ; x(n),  $\cdots$ ; x(1), x(0), and the corresponding output digital sequence from the comb section is  $X'_{in} = \cdots$ ; d(n),  $\cdots$ ; d(1), d(0). After R-fold interpolation (insertion of R-1 zeros between two adjacent points), the output digital sequence is  $X''_{in} = \cdots$ ; 0, d(n),  $\cdots$ ; 0, d(1),  $\cdots$ ; 0, d(0). Accordingly, the corresponding values of the registers are set at  $z_0$ ,  $z_1$ ,  $\cdots$ ;  $z_N$ .

Fig. 3 lists the flow chart of the above digital sequence in an R-fold interpolator and integrator of CIC filter. The input digital sequence of the register  $z_0$  is interpolated to be the *R*-fold sequence:  $X''_{in} = \dots, 0, d(n), \dots, 0, d(1), \dots, 0, d(0).$ Two clock frequencies before and after interpolation on the left of Fig 3 are  $f_s$  Hz and  $R \times f_s$  Hz, respectively. Due to the structural properties of the integrator sections of the CIC filter (each register is equivalent to an accumulator), the value of each register can be obtained by induction and listed in Fig 3. At the clock frequency of  $R \times f_s$  Hz, the value (X + Y)of the register  $z_{x+1}$  at time  $t_i$  equals the sum of the value (Y) of the register  $z_{x+1}$  at time  $t_{i-1}$  and the value (X) of the register  $z_x$ at time  $t_{i-1}$ , as listed in the red rectangle of the figure. According to the above principle, a signal with a high sampling rate can be decomposed to multi-channel signals with a low sampling rate by the polyphase interpolation together with decomposition method, as shown in Fig. 4.



Fig. 3. Signal flow chart of the R-fold interpolation CIC integrator.



Fig. 4. Signal flow diagram of the proposed polyphase interpolation structure.

According to the above results and discussion, the structure of *R*-fold interpolation CIC filter can be considered as a circuit structure of accumulator model, which consists of many adders, as shown in Fig. 5. A series of the data  $X_{in}$  are put in the *N*-stages comb filter, where the sampling rate of  $f_s$  Hz is used, to give a new series of the outputs  $X'_{in}$ , which are further put into the structure of the CIC polyphase interpolation to realize a function of a *R*-fold upsampling. The outputs of data stream with a sampling rate of  $R \not\prec f_s$  Hz decompose into the *R* phases (Y(0),Y(1),...Y(*R*-1)) to get a low-rate data stream with the same sampling rate of  $f_s$  Hz. As a consequence, a time-delay for the near neighbor signals of Y(0),Y(1),...Y(*R*-1) can be obtained to be  $1/(R \not\prec f_s)$  s.



Fig. 5. Signal flow diagram of the proposed polyphase interpolation structure.

## B. Algorithm of the Polyphase CIC Interpolation Filter

In Fig. 5, it is assumed that  $z_0$ ,  $z_1$ ,  $\dots$ ,  $z_i$ ,  $\dots$ ,  $z_N$  are the corresponding register values at time  $t_m$ , and Y(0), Y(1), ..., Y(i), ..., Y(R-1) are the corresponding outputs after the CIC polyphase decomposition. Assume that the value at coordinate node(i,j) is expressed by p(i,j), it should be a one-order polynomial of the variables  $z_0$ ,  $z_1$ ,  $\dots$ ,  $z_N$ , that is,  $p(i,j) = A_0z_0 + A_1z_1 + A_2z_2 + \cdots + A_Nz_N(i = 0, 1, \dots, N, j = 0, 1, \dots; R)$ , where  $A_i$  are the coefficients, which can be solved as follows. Thus, from Fig. 3 and Fig. 5, the value p(i,j) of each node can be calculated by the following recursive relationship formulae (2) and (3):

At time  $t_m$ ,

$$p(i, j) = p(i-1, j-1) + p(i, j-1), i \ge j \ge 1$$
(2)

At time  $t_{m+1}$ ,

$$z_i = p(i, R), i \ge 1 \tag{3}$$

Without taking into consideration the intermediate values of p(i-1,j-1) and p(i,j-1), the value p(i,j) at time  $t_m$  is expressed in terms of the values of the registers  $p(0,0) = z_0$ ,  $p(1,0) = z_1, \dots, p(i,0) = z_i, \dots, p(N,0) = z_N$ , so the point-to-point signal course map can be established, as shown in Fig. 6. The coordinate of the upper diagonal corner of the parallelogram with the light blue shade is set as (i,0), and the coordinate of the lower diagonal corner is set as (x,y); then we can obtain the following formula (4).



Fig. 6. Point-to-point signal course map.

$$H = x - i, \qquad H \ge 0$$
  

$$D = y - L = y - x + i, \qquad D \ge 0$$
  

$$i \ge 0$$
  

$$\Rightarrow \begin{cases} x \ge i \\ i \ge x - y \\ i \ge 0 \end{cases} \qquad (4)$$
  

$$\Rightarrow x \ge i \ge max(0, x - y)$$

Taking the node coordinate (i,0) at any given time as a starting point, the value p(x,y) of the node (x,y) can be calculated according to the principle of the shortest path between two nodes [17]. The number of the shortest signal paths from coordinate (i,0) to (x,y), that is, the coefficients  $(A_i)$  in  $p(x,y) = A_0z_0 + A_1z_1 + \cdots + A_iz_i + \cdots + A_Nz_N$ , can be obtained by combining the permutations principle in Fig. 6, which should be expressed by the following formula (5).

$$A_{i} = \begin{cases} C_{D+H}^{H} = C_{y}^{x-i}, & (i \neq 0) \\ C_{D+H-1}^{H-1} = C_{y-1}^{x-1}, & (i = 0) \end{cases}$$
(5)

Since  $p(i,0) = z_i$ , the value of the register  $z_i$  arriving at coordinate (x,y) through (i,0) is  $p(x,y)_i = A_i z_i$ . Let

$$\sigma = \begin{cases} 0, \ x > y \\ 1, \ x \le y \end{cases}$$
(6)

and then the value for the node (x,y) can be obtained as

$$p(x, y) = \sum_{i=\max(0, x-y)}^{x} p(x, y)_{i} = \sigma \cdot A_{0} z_{0} + \sum_{i=\max(1, x-y)}^{x} A_{i} z_{i}$$

$$= \sigma \cdot C_{y-1}^{x-1} z_{0} + \sum_{i=\max(1, x-y)}^{x} C_{y}^{x-i} z_{i}$$
(7)

Let k = max(1, x-y), and the formula (7) can be rewritten as the following formula (8):

$$p(x, y) = \sigma \cdot C_{y-1}^{x-1} z_0 + C_y^{x-k} z_k + C_y^{x-k-1} z_{k+1} + C_y^{x-k-2} z_{k+2} + \dots + C_y^0 z_x$$
(8)

When x = N, the final polyphase decomposition output can be obtained from the formula (9). Y(y) = p(N, y)

$$= \sigma \cdot C_{y-1}^{N-1} z_0 + \sum_{i=\max(1,N-y)}^{N} C_y^{N-i} z_i, \quad (y = 0, 1, 2, \dots, R-1)$$
<sup>(9)</sup>

When y = R, the value of the next clock cycle register  $z_x$  can be obtained from the formula (10).

$$z_{x} = p(x,R)$$
  
=  $\sigma \cdot C_{R-1}^{x-1} z_{0} + \sum_{i=\max(1,x-R)}^{x} C_{R}^{x-i} z_{i}, \ (x = 1, 2, 3, \dots, N)$  (10)

Therefore, the formulae (9) and (10) can be regarded as the mathematical functions of the algorithm of the polyphase interpolation CIC filter, which will be validated by simulation and implementation. The structure of polyphase interpolation CIC filter in Fig.5 can be considered as a circuit structure of convolution model, as shown in Fig. 7. Let  $\vec{C} = (\sigma \cdot C_{y-1}^{N-1}, \cdots C_{y}^{N-i}, \cdots C_{y}^{0}), i = \max(1, N-y), \quad \vec{Z} = (z_0, z_1, \cdots z_N), \quad \vec{Y} = (Y(0), Y(1), \cdots Y(R-1)),$  and the outputs of CIC polyphase interpolation can be expressed by  $\vec{Y} = \vec{Z} * \vec{C}$ .\* is the convolution symbol.



Fig. 7. The convolution-model structure of the CIC polyphase interpolation filter

For instance, setting R = 10, N = 5, and M = 1, according to formulae (9) and (10), the outputs of Y(0), Y(1), Y(2),  $\dots$ , Y(9) after the polyphase decomposition can be expressed as:

$$Y(0) = p(5,0) = C_0^0 z_5, Y(1) = p(5,1) = C_1^1 z_4 + C_1^0 z_5$$
...
$$Y(9) = p(5,9) = C_8^4 z_0 + C_9^4 z_1 + C_9^3 z_2 + C_9^2 z_3 + C_9^1 z_4 + C_9^0 z_5,$$

$$z_0 = p(0,10) = X'_{in},$$

$$z_1 = p(1,10) = C_9^0 z_0 + C_{10}^0 z_1,$$
...
$$z_5 = p(5,10) = C_9^4 z_0 + C_{10}^4 z_1 + C_{10}^3 z_2 + C_{10}^2 z_3 + C_{10}^1 z_4 + C_{10}^0 z_5.$$

## III. SIMULATION

## A. System Resource Usage

The performance of the designed CIC filter will next be analyzed and tested to assess if it can achieve a high accuracy time-delay of an ultrasonic phased array. The signal bandwidth of the ultrasonic phased array echo generally ranges from 0.5 to 15 MHz. The original signal of the sampling rate is set as 100 MHz, and thus the sampling rate after the *R*-fold interpolation is  $R \times 100$  MHz. When N = 5, M = 1 and R = 2, 4, 8, 10, 16 in the formula (1), the CIC magnitude response is shown in Fig. 8. The CIC filter's first-class anti-imaging frequency (100 MHz) attenuates below -100 dB, which is sufficient to meet the performance requirements of the system.



Fig. 8. Frequency response of the proposed polyphase CIC interpolator filter.

Base on the Quartus II 13.1 and Altera's Arria-II: EP2AGX65DF29I5 FPGA, at a clock frequency of 100 MHz, the system with the CIC filter for interpolation of an anti-imaging filter by setting  $R = 2, 4, 8, 10, 16, f_s = 100$  MHz and M = 1 gives the time-delay accuracy of 5, 2.5, 1.25, 1, 0.625 ns, respectively, for ultrasonic signals. According to the formulae (9), (10) and Fig 7, we obtained the resource consumption of the multiphase interpolation CIC filter based on the FPGA, as listed in Table I. Obviously, the amount of resource consumption is very low, especially for DSP-block. However, compared with the CIC filter, the stages *L* of the traditional FIR polyphase interpolation filter will be more than *N*·*R* to accomplish the same performance, and thus consume approximately [*L*/2] DSP-blocks according to the symmetry structure of coefficients in the FIR.

TABLE I: THE RESOURCE CONSUMPTION OF THE CIC MULTIPHASE INTERPOLATION FILTER BASED ON THE FPGA

| - | Time delay $\Delta D_t$ (ns) | Interpolation factor R | Logic element<br>Total 50600 | Registers<br>Total 50600 | DSP-block<br>Total 312 |
|---|------------------------------|------------------------|------------------------------|--------------------------|------------------------|
|   | 5.00                         | 2                      | 190                          | 261                      | 0                      |
|   | 2.50                         | 4                      | 386                          | 297                      | 0                      |
|   | 1.25                         | 8                      | 1005                         | 354                      | 0                      |
|   | 1.00                         | 10                     | 1310                         | 390                      | 6                      |
|   | 0.625                        | 16                     | 2326                         | 446                      | 2                      |

## B. Error and Accuracy Analysis

The following method is used for the analysis of the results, which is based on the CIC filter polyphase decomposition structure design. The ultrasonic echo signal is generated by a 5 MHz sine wave using the following mathematical formula (11).

$$X_{in} = 2047 \cdot sin(2 \cdot \mathbf{PI} \cdot f_k \cdot \frac{n}{f_s})) \tag{11}$$

where  $f_s$  is the 100 MHz sampling frequency (sampling period 10 ns),  $f_k$  is the 5 MHz sine wave signal frequency. The signal is decomposed into 10 waveform phases using the above 10-fold polyphase interpolation and decomposition CIC filter method, as shown in the inset of Fig. 9. The detailed delay path is shown in Fig. 9.



Fig. 9. The 10 phase 1 ns delay accuracy signals of the 10-fold polyphase CIC interpolation algorithm. Inset shows the signals of the 10-fold polyphase CIC interpolation algorithm in the time range of 0-200 ns.

In order to further verify the relative time-delay accuracy error of the *R*-phase signals with respect to the first phase time-delay, we take 4096 points (about 205 complete cycle) FFT transform to obtain their phase difference with the first phase. As listed in Table II, when R = 2, 4, 8, 10, 16, the time delay errors ( $\Delta D_t$ ) between each phase and the first phase are converted through the phase errors ( $\Delta Pha$ ) formula:  $\Delta D_t =$  $\Delta Pha/(2 \cdot \mathbf{PI} \cdot f_k)$ ). It can be assumed that the first channel parameters are all zeros. We can find that the bigger the value of *R* is, the more the  $\delta_{max}$  (the maximum relative time errors) is, when R = 10, the  $\delta_{max}$  is only -0.08%, which is completely satisfactory for the system requirements.

TABLE II: ERROR ANALYSES OBTAINED FROM MATLAB SIMULATION WITH DELAY PRECISION BASED ON THE CIC POLYPHASE INTERPOLATION

| Channel | $\Delta D_t$ The measured values/Theoretical value (ns) |          |                    |                  |                      |  |  |
|---------|---------------------------------------------------------|----------|--------------------|------------------|----------------------|--|--|
| /Error  | 5( <i>R</i> =2)                                         | 2.5(R=4) | 1.25( <i>R</i> =8) | 1( <i>R</i> =10) | 0.625( <i>R</i> =16) |  |  |
| 0       | 0/0                                                     | 0/0      | 0/0                | 0/0              | 0/0                  |  |  |
| 1       | 5.0002                                                  | 2.5005   | 1.2500             | 1.0001           | 0.6250               |  |  |
| 1       | /5                                                      | /2.5     | /1.25              | /1               | /0.625               |  |  |
| 2       |                                                         | 5.0008   | 2.5000             | 2.0001           | 1.2500               |  |  |
| 2       | X                                                       | /5       | /2.5               | /2               | /1.25                |  |  |
| 3       | ~                                                       | 7.5008   | 3.7500             | 3.0001           | 1.8751               |  |  |
| 5       | ~                                                       | /7.5     | /3.75              | /3               | /1.875               |  |  |
| 4       | ~                                                       | ×        | 5.0000             | 4.0001           | 2.5000               |  |  |
| 4       | ^                                                       |          | /5                 | /4               | /2.5                 |  |  |
| 5       | ~                                                       | ×        | 6.2499             | 5.0000           | 3.1250               |  |  |
| 5       | ~                                                       |          | /6.25              | /5               | /3.125               |  |  |
| 6       | х                                                       | x        | 7.5000             | 5.9999           | 3.7500               |  |  |
|         |                                                         |          | /7.5               | /6               | /3.75                |  |  |
| 7       | x                                                       | x        | 8.7495             | 6.9999           | 4.3745               |  |  |
|         |                                                         |          | /8.75              | /7               | /4.375               |  |  |
| 8       | x                                                       | x        | ×                  | 7.9995           | 4.9998               |  |  |
|         |                                                         |          |                    | /8               | /5                   |  |  |
| 9       | x                                                       | x        | х                  | 8.9992           | 5.6247               |  |  |
|         |                                                         |          |                    | /9               | /5.625               |  |  |
| 10      | x                                                       | x        | ×                  | x                | 6.2496               |  |  |
|         |                                                         |          |                    |                  | /6.25                |  |  |
| 11      | х                                                       | ×        | x                  | ×                | 6.8744               |  |  |
|         |                                                         |          |                    |                  | /6.8/5               |  |  |
| 12      | х                                                       | x        | ×                  | ×                | 7.4993               |  |  |
|         |                                                         |          |                    |                  | /1.5                 |  |  |

| 13             | x     | ×     | x      | ×      | 8.1241<br>/8.125 |
|----------------|-------|-------|--------|--------|------------------|
| 14             | х     | ×     | x      | ×      | 8.7490<br>/8.75  |
| 15             | x     | x     | ×      | ×      | 9.3733<br>/9.375 |
| $\delta_{max}$ | 0.00% | 0.03% | -0.04% | -0.08% | -0.28%           |

We use the polyphase interpolation CIC filter algorithm to achieve 1 ns time-delay accuracy with the sampling frequency of 100 MHz, and the maximum relative error -0.08% is very low, which suggested the superiority of this algorithm over some previously reported algorithms [10],[12]. For example, in the literature [10], the authors studied beamforming algorithm in ultrasound imaging system, and 4-fold interpolation and a 33-tap FIR filter was used at the sampling frequency of 40 MHz, and the time-delay accuracy was 6.25 ns. Moreover, a proposed design implemented dynamic receive focusing with minimum time-delay accuracy of 3.125 ns for 40 MHz input data rate by ASIC [12].

## C. Practical Application

This algorithm has been used in the PA2000 ultrasonic phased array produced by Guangzhou Doppler Electronic Technologies Co., Ltd., which adopts Altera's Arria-II: EP2AGX65DF29I5 FPGA as the digital signal processor of the ultrasonic phased array system. At a 100 MHz clock frequency, a 5 MHz probe is used to test the standard phased array test block, a time-delay of 1 ns can be obtained between two adjacent phases using 10-fold interpolation filtering, where the phases are decomposed into a 10-phase output. The time-delay algorithms are realized by the FPGA. Fig. 10 shows the S-scanning image of the holes ( $\phi = 1$  mm) in the test block using the focusing delay technology of the CIC filter [18]. The scan imaging of the test holes of the phased array block is of high quality and high sensitive, which proves the practical application of the time-delay algorithm.



Fig. 10. S-scanning image of the holes ( $\phi = 1 \text{ mm}$ ) in the standard test block with the beamforming technology of 1 ns time-delay by the CIC filter.

## IV. CONCLUSION

We proposed a software time-delay algorithm method using the polyphase interpolation CIC filter. Taking into consideration the great difficulty with directly processing  $R \times f_s$  Hz sampling rate signal, the signal is decomposed into Rphase signals of  $f_s$  Hz to achieve a time-delay of  $1/(R \times f_s)$  s accuracy. The algorithm is implemented using the FPGA, which has some advantages such as computation quantity, computation speed, resolution, and cost, which makes it highly suited for real-time and high-precision time-delay algorithm implementation. The results provide an important application for the high accurate time-delay of multi-channel phased arrays. Owing to the passband decay of the CIC filter, a passband compensation filter as an auxiliary step to the CIC is alternative to achieve robust performance [19], [20]. For engineering applications, hardware processing capabilities and real-time computing power should also be considered, and hence further work is necessary.

## ACKNOWLEDGMENT

This work was financially supported by the National Key Foundation for Exploring Scientific Instrument (2013YQ230575) and Guangzhou Science and Technology Plan Project (201509010008).

#### REFERENCES

- M. Almekkawy, J. Xu, and M. Chirala, "An optimized ultrasound digital beamformer with dynamic focusing implemented on FPGA," *Eng Med Biol Soc.*, pp. 3296-3299, 2014.
- [2] D. Lertsilp, S. Umchid, U. Techavipoo, and P. Thajchayapong, "Improvements in ultrasound elastography using dynamic focusing," pp. 225-228, 2012.
- [3] M. Mattarei, A. Canciamilla, S. Grillanda, and F. Morichetti, "Variable symbol-rate DPSK receiver based on silicon photonics coupled-resonator delay line," *J Lightwave Technol.*, vol. 32, no. 19, pp.3324-3330, October 2014.
- [4] Z. L. Yu, M. A. P. Pertijs, and G. C. M. Meijer, "A programmable analog delay line for micro-beamforming in a transesophageal ultrasound probe," in *Proc.* 10<sup>th</sup> *IEEE Inter. Confer Solid-St Integr Circ Technol.*, pp.299-301, 2010.
- [5] J. H. Kim, J.Y. Um, J. Y. Sim, and H. J. Park, "Time-interleaved sample clock generator for ultrasound beamformer application," in *Proc. Syst Chip Des Inter Confer.*, pp. 290-293, 2011.
- [6] B. G. Tomov and J. A. Jensen, "Compact FPGA-based beamformer using oversampled 1-bit A/D converters," *IEEE Trans. Ultrason Ferroelectr Freq Control.*, vol. 52, pp. 870-880, May 2005.
- [7] Y. Mo, T. Tanaka, S. Arita, A. Tsuchitani, K. Inoue, and Y. Suzuki, "Pipelined delay-sum architecture based on bucket-brigade devices for on-chip ultrasound beamforming," *IEEE J Solid-St Circ.*, vol. 38, no. 10, pp.1754-1757, October 2003.
- [8] J. R. Talman, S. L. Garverick, C. E. Morton, and G. Lockwood, "Unit-delay focusing architecture and integrated-circuit implementation for high-frequency ultrasound," *IEEE Trans. Ultrason Ferroelectr Freq Control.*, vol. 50, no. 11, pp. 1455-1463, November 2003.
- [9] A. Agarwal, Y. M. Yoo, F. K. Schneider, C. Gao, L. M. Koh, and Y. Kim, "New demodulation method for efficient phase-rotation-based beamforming," *IEEE Trans. Ultrason Ferroelectr Freq Control.*, vol. 54, no. 8, pp. 1656-1668, August 2007.

- [10] J. Kwon, J. H. Song, S. Bae, T. Song, and Y. Yoo, "An effective beamforming algorithm for a GPU-based ultrasound imaging," *IEEE Inter. Ultrasonics Symp Proc Syst.*, pp. 619-622, 2012.
- [11] J. Ma, K. Karadayi, M. Ali, and Y. Kim, "Ultrasound phase rotation beamforming on multi-core DSP," *Ultrasonics.*, vol. 54, pp. 99-105, 2014.
- [12] C. Dusa, S. Kalalii, P. Rajalakshmi et al., "Integrated 16-channel transmit and receive beamforming ASIC for ultrasound imaging," in *Proc. International Conference on VLSI Design*, 2015, pp. 215-220,
- [13] D. N. Milić and V. D. Pavlović, "A new class of low complexity low-pass multiplierless linear-phase special CIC FIR filters," *IEEE Signal Process Lett.*, vol. 21, no. 12, pp. 1511-1515, December 2014.
- [14] V. Sedinin, V. Mamychev, and A. Glukhov, "The design of high-frequency CIC-filters according to the 0.18 µm technology," in *Proc.* 15<sup>th</sup> *Inter Micro/nanotechnol. Electron Dev Confer.*, pp. 167-169, 2014.
- [15] T. B. Georgiev, N. S. Ivanov, and J. J. Arendt, "Scalable intersample interpolation architecture for high-channel-count beamformers," *IEEE Inter. Ultrasonics Symp.*, pp.381-384, 2012.
- [16] B. P. Stošić and V. D. Pavlovi, "Design of selective CIC filter functions," *AEU-Int J Electron Commun.*, vol. 68, pp. 1231-1233, 2014.
- [17] G. R év ári, J. J. B ŕ ó, and T. Cinkler, "On shortest path representation," *IEEE-ACM Trans Netw.*, vol. 15, no. 6, pp. 1293-1306, December 2007.
- [18] R. B. Lin, G. X. Liu, and W. M. Tang. "FPGA Implementation of ultrasonic s-scan coordinatefpga implementation of ultrasonic s-scan coordinate conversion based on radix-4 CORDIC algorithm," *IACSIT International Journal of Engineering and Technology*, vol. 7, no. 3, pp. 249-253, June 2015.
- [19] F. J. Harris, "Reduce energy requirements by coupling a poly-phase pre-filter and cic filter in high-performance sigma-delta A/D converters," *IEEE Inter. Sym Circ Syst.*, pp. 1600-1603.2014.
- [20] G. J. Dolecek and A. F. Vazquez, "Novel droop-compensated comb decimation filter with improved alias rejections," *AEU-Int J Electron Commun.*, vol. 67, pp. 387-396, 2013.



**Guixiong Liu** was born in 1968. He is currently a full professor and a doctoral supervisor in South China University of Technology, Guangzhou, China. He received his doctor degree from Chongqing University, Chongqing, China, in 1995. His research interests include modern detection technology and networked control, intelligent sensing theory and method, information system modeling theory andapplication.



Wenming Tang was born in 1983. He received his B.Sc. & M.Sc. degree in 2005 & 2008, respectively, from Harbin University of Science and Technology, Harbin, China. He is currently a Ph.D. condidate supervised by Professor Guixiong Liu at School of Mechanical and Automotive Engineering at South China University of Technology. His research interests include ultrasonic nondestructive testing and signal processing.