FPGA Implementation of Ultrasonic S-Scan Coordinate Conversion Based on Radix-4 CORDIC Algorithm

Ruobo Lin, Guixiong Liu, and Wenming Tang

Abstract—To solve the existing problems on traditional method of calling function at ultrasonic S-scan coordinate conversion, this paper introduced the FPGA implementation of ultrasonic S-scan coordinate conversion based on Radix-4 CORDIC algorithm. On the analysis of the Radix-4 Rotation CORDIC algorithm, we selected 7-level pipeline structure on FPGA, used the preprocessing technique to solve the non-constant problem of scale factor. The simulation results show that the maximum error of Radix-4 Rotation CORDIC algorithm is $1.5 \times 10^{-3}$ and the computing latency is only 7T at 16 bit data-width under 100MHz FPGA. The scan conversion rate speed up at least 10 times, and the output effect of the image is basically the same as the traditional method. The algorithm has the characteristics of low latency, high speed and high precision, suitable for real-time signal processing applications.

Index Terms—Ultrasonic s-scan, coordinate conversion, radix-4 cordin, FPGA, real-time signal processing.

I. INTRODUCTION

Ultrasonic detection technology was first used in military radar and medical B mode ultrasound, but now has been widely used in industrial phased array [1]-[3]. Digital Scan Conversion (DSC) is one of the key technologies in signal processing, including coordinate transformation and linear interpolation. Traditional method transfers the polar coordinates into Cartesian coordinates by calling function directly, costing more time, requiring larger space for data storing, and resulting a lot of redundant data. Therefore many scholars have studied various algorithms on the digital scan conversion, which the CORDIC (Coordinate Rotation Digital Computer) algorithm is one of the most typical algorithms. For example, Sikdar (2001) used traditional CORDIC algorithm to realize the coordinate transformations of B mode ultrasound linear scan based on MAP-CA processor, with online programming and solving the coordinate’s conversion effectively [1]. Tan (2003) realized the coordinate conversion of B mode ultrasound linear scan based on traditional CORDIC algorithm, solving the conversion of Cartesian coordinate to polar coordinates [4]. The above research is focus on the B mode ultrasound linear scan. However, the sector scan (S-scan) is mainly used in industrial Ultrasonic detection. Many scholars have studied the application of CORDIC algorithm on S-scan. Zeng (2011) used Radix-4 vector CORDIC algorithm to realize coordinate transformations of B mode ultrasound, which he pointed out the shortcomings of Radix-2 algorithm and the advantages of Radix-4 algorithm, but they haven’t discussed in detail on the practical application yet [5]. Liu (2014) studied the delay focusing and imaging technology of ultrasonic phased array, and put forward the FPGA implementation of Radix-2 CORDIC algorithm using complement method [6], [7]. Now the method of Lookup table or directly calling function on FPGA is still used on ultrasonic scan coordinate conversion with the shortcomings of long latency and slow speed. This paper will use Radix-4 rotation CORDIC algorithm to realize coordinate conversion of ultrasound S-scan on FPGA, which has an important practical value in improving the real-time signal processing.

The paper is arranged as follows: the second part introduced the theory of ultrasonic S-scan conversion principle, the Radix-4 CORDIC algorithm mechanism and realiztion on FPGA, the third part introduced the simulation and analysis of Radix-4 Rotation CORDIC algorithm, finally draw a conclusion.

II. THEORY

A. Ultrasonic S-Scan Coordinate Conversion Principle

There are many modes on ultrasound scan, such as A-scan, B-scan, C-scan and S-scan etc. Ultrasonic S-scan is that the ultrasonic transducer puts the different transmit pulse signals to different phase through different array elements, and the probe scans the sector area along an angle, which each original beam rotates around the Y axis, and samples along the radial direction on different echo. All points of ultrasonic echo data are stored on polar form in the data memory. The polar coordinate system is converted into the corresponding Descartes coordinate system after interpolation and filtering, and output 2-Dimension image on screen. This process of digital conversion corresponds to the conversion of polar coordinate to Cartesian coordinate system.

The comparison chart of coordinate conversion is shown in Fig. 1, which the first method used traditional method of calling function and the second used Radix4 CORDIC algorithm module. The traditional method directly calls the DSP system function using a lot of multipliers and divider, which causes the system resource overhead, long delay, and data congestion. For high precision, the Lookup table memory resources are in big demand, which is difficult for real-time processing. This paper used Radix-4 CORDIC algorithm module to realize coordinate conversion, which could convert a lot of trigonometric function operations into...
add operation and shift operation, with suitable for FPGA implementation and real-time processing.

---

The model graph of Cartesian coordinates and polar coordinates is shown in Fig. 2. We define the point of Cartesian coordinates as \( P(x_i, y_i) \), which \( x_i \) and \( y_i \) are expressed as horizontal coordinates and vertical coordinates of Cartesian coordinate system. We also define the point of polar coordinates as \( P(r_i, \phi_i) \), which \( r_i \) and \( \phi_i \) are expressed as the polar radius and polar angle. Thus the mathematical expression of polar coordinate to Cartesian coordinate system can be expressed as formula (1).

\[
\begin{align*}
    x_i &= r_i \cos \phi_i \\
    y_i &= r_i \sin \phi_i
\end{align*}
\]

The image data, with huge data volume, demands a lot of work when conversion procession. It is very important to improve the coordinate conversion of ultrasound scan if a CORDIC algorithm module on FPGA real time operation is designed.

---

**B. Radix-4 CORDIC Algorithm Mechanism**

Radix-4 CORDIC algorithm has two operation modes: rotation mode and vectoring mode [8], [9]. The sine and cosine of any angle can be calculated on rotation mode which realized the conversion of Cartesian coordinates to polar coordinate. The operation of square root and the arctangent function can be realized on vector mode, which realized the conversion of Cartesian coordinates to polar conversion. In this paper, we focus on the rotation mode which will be discussed as followed.

The basic iterative equations of Radix-4 Rotation CORDIC algorithm [8] are:

\[
\begin{align*}
    x_{i+1} &= x_i - \sigma_i 4^{-j} y_i \\
    y_{i+1} &= y_i + \sigma_i 4^{-j} x_i \\
    z_{i+1} &= z_i - \arctan(\sigma_i 4^{-j})
\end{align*}
\]

where, \( \sigma_i \in \{-2, -1, 0, +1, +2\} \), After \( n \)-iterations \( x \) and \( y \) will have a gain of the scale factor \( K \), which can be expressed as formula (3).

\[
K = \prod_{i=0}^{n-1} (1 + \sigma_i^2 4^{-2j})^{1/2}
\]

We define a new variable:

\[ W_i = 4^j z_i \]

The value of \( \sigma_i \) is determined by the convergence value of \( W_i \).
For $i=0$

$$\sigma_i = \begin{cases} +2 & 5/8 \leq w_0 \\ +1 & 3/8 \leq w_0 < 5/8 \\ 0 & -1/2 \leq w_0 < 3/8 \\ -1 & -7/8 \leq w_0 < -1/2 \\ -2 & w_0 \leq -7/8 \end{cases}$$

(4)

For $i>0$

$$\sigma_i = \begin{cases} +2 & 3/2 \leq w_i \\ +1 & 1/2 \leq w_i < 3/2 \\ 0 & -1/2 \leq w_i < 1/2 \\ -1 & -3/8 \leq w_i < -1/2 \\ -2 & w_i \leq -3/2 \end{cases}$$

(5)

when $x_0=1, y_0=0, z_0 = \varphi$; after $n$-iterations, $w_i \rightarrow 0$, the equation will be transformed into formula (6).

$$\begin{cases} x_{n+1} = K^{-1} \cos \varphi \\ y_{n+1} = K^{-1} \sin \varphi \\ z_{n+1} = 0 \end{cases}$$

(6)

According to the iteration results of $x_{n+1}$ and $y_{n+1}$, the sine and cosine values of angle can be calculated as:

$$\cos \varphi = Kx_{n+1}$$
$$\sin \varphi = Ky_{n+1}.$$

C. Radix-4 CORDIC Algorithm Realization on FPGA

We used pipelining architecture on FPGA implementation. As the scale factor $K$ is not a constant, the pretreatment technique is used. According to the different values of $\sigma_i$, we translated $k_i$ and $\arctan(\sigma_i \cdot 4^{-i})$ into constant by Lookup table. Thus the iterative process is translated into add and shift operations, which the computing speed can be accelerated. If the angle is represented by 16 bit binary number, the angle resolution is expressed as $360^\circ/(65536-1)=0.0055^\circ$. As the angle resolution of CORDIC algorithm has not resolved after 7-iterative, so we use 7-level pipeline architecture to realize algorithm on FPGA. The $i$-level pipeline structure diagram on FPGA is shown in Fig. 3.

![Fig. 3. The i-level pipeline structure diagram on FPGA.](image)

Here are some codes of the comparison and iterative process at the 2-level pipeline architecture.

```vhdl
begin
  if(w[1] > 352025) then //w[1]=3/2
    begin
      //sigma <= 2
      x(3) <= x(2) + {{1}{{2}[15]}}, x(2)[15 : 1];
      y(3) <= y(2) + {{1}{{2}[15]}}, x(2)[15 : 1];
      z(3) <= 108810;
      Kx(2) <= 58617; // 65536*1/sqrt[1 + 4/16];
    end
    else if(w[1] > 117342) then //1/2
      begin
        //sigma <= 1
        x(3) <= x(2) + {{1}{{2}[15]}}, x(2)[15 : 2];
        y(3) <= y(2) + {{1}{{2}[15]}}, x(2)[15 : 2];
        z(3) <= 2 - 57492;
        Kx(2) <= 63579;
      end
      .......
  end
end
```

III. SIMULATION

A. System Resource Usage

Based on the Quartus II9.0, we have simulated on Cyclone II: EP2S15F484C6 FPGA to observe system resource usage. The system resource usage of algorithm using pipelining architecture is list at Table I. We can find that the system resources usage of algorithm is very low which is only 1781 logic elements, 282 logic registers, 16 multipliers and 0 PLLs.

| TABLE I: THE SYSTEM RESOURCE USAGE OF ALGORITHM USING PIPELINING ARCHITECTURE |
|-----------------------------|-----------------------------|-------------|-------------|
| Logic elements    | Logic registers | Multiplier | PLLs |
| usage   | 1781 | 282 | 16 | 0 |
| ratio   | 12% | 2% | 31% | 0% |

| TABLE II: COMPARISON TABLE ON THE ACCURATE AND CALCULATED VALUE OF COSINE AT ANY ANGLE |
|-----------------------------|-----------------------------|-------------|-------------|
| angle $\varphi$ ($^\circ$) | accurate value of cosine | calculated values of cosine | calculation error |
| 0 | 1.0000 | 1.0000 | 0.0000 |
| 10 | 0.9848 | 0.9849 | 0.0001 |
| 20 | 0.9397 | 0.9396 | -0.0001 |
| 30 | 0.8660 | 0.8658 | -0.0002 |
| 40 | 0.7660 | 0.7658 | -0.0003 |
| 50 | 0.6428 | 0.6431 | 0.0003 |
| 60 | 0.5000 | 0.5001 | 0.0003 |
| 70 | 0.3420 | 0.3420 | 0.0000 |
| 80 | 0.1736 | 0.1739 | 0.0003 |
| 90 | 0.0000 | 0.0000 | 0.0000 |

B. Calculation Speed

On the analysis of the algorithm, we selected the angle of 0 degree as the object of observation. The timing simulation waveform is shown in Figure 4. If the system clock is $T$, the computing latency of sine and cosine function is 7$T$, the total conversion time of image is $t=600 \times 420 \times 7T + 300T = 1.8ms$. However,
the conversion time of the same image is 12 ms for B mode and 20.3 ms for color mode under 300 MHz by calling function method on the MAP-CA processor [1]. So the scan conversion rate of the algorithm would improve the efficiency at least 10 times than the traditional algorithm.

C. Error Analysis

Based on the Modelsim6.5, we have simulated on Cyclone II: EP2S15F484C6 FPGA to analyze the precision of the algorithm, where the clock frequency is 100MHz and the data width is 16 bit. In order to improve the resolution of data operations, we expand 8192 times to sample data. At Table II we give the cosine values of part of angle on FPGA operation, where the maximum calculation error is 3×10⁻⁴. The calculation error of sine function is shown in Figure 5 when the angle is between -90 degrees and less than 90 degrees, where the maximum calculation error is 1.5×10⁻³.

D. Output Effect

To verify the output effect of scanning image coordinate conversion, based on PA2000 ultrasonic phased array instrument made in Guangzhou Doppler Electronic Technology Co., Ltd., the S-scan original image data is sampled. S-scan data acquisition parameters of ultrasonic phased array are list in Table III. Respectively using the traditional method of calling function and Radix-4 Rotation CORDIC algorithm module to read the image data, we would achieve the output image by coordinate conversion, and compare their output effect based on Matlab. The comparison chart is shown in Fig. 6. The test results show that the imaging quality is basically the same (the original image with 524×31 pixels changed to 930×524 pixels after interpolation) in two ways conversation. Thus, the method of using Radix-4 Rotation CORDIC algorithm module could effectively replace the traditional method of calling function in image processing.

FIG. 4. The timing simulation waveform of any angle on FPGA.

TABLE III: S-SCAN DATA ACQUISITION PARAMETERS OF ULTRASOUND PHASED ARRAY

<table>
<thead>
<tr>
<th>focus rule number</th>
<th>sampling data points</th>
<th>scanning angle step</th>
<th>angle range</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>524</td>
<td>1°</td>
<td>0° ~ 40°</td>
</tr>
</tbody>
</table>

IV. Conclusion

In this work, we used Radix-4 Rotation CORDIC algorithm module to realize the ultrasonic S-scan image coordinates conversion, used the pretreatment technique to solve the non-constant problem of scale factor K. Since the workload of image data conversion process is very huge, including a large number of functional operations, this work has very high application value on improving the speed of computation. The simulation results show that the maximum error of Radix-4 Rotation CORDIC algorithm is 1.5×10⁻³ and the computing latency is only 7T at 16 bit data-width under 100MHz FPGA. Compared with the traditional method of calling function, the scan conversion rate speed up at least 10 times, and the output effect of the image is basically the same as the traditional method.

ACKNOWLEDGMENT

This work was partially supported by the national major scientific instrument and equipment development project of China in 2013(No. 2013YQ230575).
REFERENCES


Ruobo Lin was born in 1974. He is currently an associate professor. He received a lecturer in Xidian University, xi’an, China in 1996. He received his master degree from Hunan University, Changsha, China, in 2009. He is a visiting scholar of South China University of Technology now. His research interests include electromechanical integration technology, signal processing, instrument, measurement and control technology.

Guixiong Liu was born in 1968. He is currently a professor and as a doctoral supervisor in South China University of Technology, Guangzhou, China. He received his doctor degree from Chongqing University, Chongqing, China, in 1995. His research interests include modern detection technology and networked control, intelligent sensing theory and method, information system modeling theory and application.

Wenming Tang was born in 1983. He is currently an associate professor. He received a lecturer in Harbin University of Science and Technology, Harbin, China in 2005. He received his master degree from Harbin University of Science and Technology, Harbin, China in 2008. He is currently pursuing doctor degree at School of Mechanical and Automotive Engineering in South China University of Technology. His research interests include ultrasonic nondestructive testing, signal processing.