

# Design and Implementation of Area, Power and Speed Efficient FIR Filters based on Reversible Tap Delay and Accumulate Block Optimization

<sup>1</sup>P PushpaLatha,<sup>2</sup>M Sai Vamsi Kamal Reddy,

<sup>1</sup>Assistant Professor, <sup>2</sup>Post graduation student, Department of ECE, JNTU Kakinada University,

Kakinada, Andhra Pradesh, India. <sup>2</sup>saivamsy1996@gmail.com

Abstract— The finite impulse response (FIR) filters are one of the two types of filters in digital realm. These filters are extensively used in signal processing and communication systems applications such as noise reduction, echo cancellation, image enhancement etc. The cost and the performance of the FIR filter depend on low power consumption, good computation time and the minimum hardware used by the filter during implementation. Over a few decades so much research has been done for the reduction of the implementation cost of the multiple constant multiplication blocks. But the cost dominant structure involves structural adders and registers in tap delay-and-accumulate line. To close the area-power efficiency gap in the tap delay and accumulate line, a bisection at some tap position. The components in the filter design such as adders, registers are designed with the help of reversible logic. This Proposed System Implemented using Verilog HDL and Simulated by ModelSim 6.4 c and Synthesized by Xilinx tool. The proposed system is implemented in FPGA Spartan 3 XC3S 200 TQ-144.

Index terms—FIR filter design, Reversible logic gates, Fredkin gate, Feynman gate, tap delay and accumulate block, digital signal processing.

# I. INTRODUCTION

Digital signal processing is extensively used in various applications such as picture control, digital image processing and information con troll applications etc. The crucial device used in digital signal processing is a digital filter. Finite impulse response filters are extensively used and preferred over infinite impulse response filters because FIR filters can achieve linear phase response and pass a signal without phase distortion. FIR filters are easier to implement compared to IIR filter owing to its stability. The Internet of Things (IoT) [1] is a system of interrelated, internet connected objects that are able to collect and transfer data over a wireless network without human intervention. To remove the unnecessary and undesired noises and interferences [2]-[5], signal conditioning circuits are used. Digital filters play a major role in signal to noise ratio. Thus the techno optimism of IoT and the related technologies of the IoT are impacted by the stagnation in the size, power, sensitivity and accuracy of the FIR filters. The techno-optimism of IoT and related technologies [6] are impacted by the signal-to-noise ratio, response time, stagnation in their size, power, sensitivity and accuracy of the digital filters.

An N-tap finite impulse response (FIR) digital filter can be realized from the following discrete time convolution.

$$y[n] = h[n] * x[n] = \sum_{i=0}^{N-1} h[i]x[n-i]$$

where x[n] and y[n] are the n<sup>th</sup> time domain inputs and output data samples, respectively.

h[i], i = 0, 1... N - 1, are the filter coefficients of Enq the impulse response function

$$H(z) = \sum_{i=0}^{N-1} h[i] z^{(-i)}$$

FIR filter has two different configurations namely direct form FIR filter and transposed form FIR filter. The transposed form FIR filter shown in Figure 1 can be constructed from the direct form FIR filter by exchanging the input and output and inverting the direction of signal flow. Multiple constant multiplications are an arithmetic operation that multiples a set of fixed points with a same fixed point variable. We consider transposed form of FIR filter because direct form FIR filters does not support MCM technique. The multiple constant multiplication technique helps in the saving of the computation time. Transposed form FIR filters are inherently pipelined and support multiple constant multiplication (MCM) technique that results in significant saving of the computation time.





Fig.1. Transposed direct form FIR filter

# II. REVIEW OF PREVIOUS WORK

In FIR filters, there are two main blocks multiple constant multiplications (MCM) block and tap delay and accumulate (TDA) block. The name of the blocks itself depict the operation being performed in that particular block. Since the size, power, sensitivity and accuracy of the finite impulse response filter plays a significant role in the digital signal processing applications many attempts are made to optimize those specifications. For that to happen the multiple constant multiplication blocks and the tap delay and accumulate block should be optimized. Many existing algorithms [7]-[13] are designed for the design of low complexity FIR filter but these algorithms mostly minimize the multiple constant multiplication (MCM) block and neglect the need of optimizing the adders and registers in the tap delay and accumulate(TDA) block. The main reason is the operands of the structural adders of the TDA block are derived from different time delayed input samples. Unfortunately, these structural adders are more expensive and power hungry than the sharable adders in the optimized MCM block, which makes them the hindrance to further area and power reduction of FIR filter implementation. Some most recent algorithms [14] and [15] investigated this problem.

In [15] which is a most recent algorithm, pipeline registers in Enc are used to retime the critical path and improve the through put of the filters. Though the delay is decreased from the solution provided in this paper, due to the introduction of extra pipeline registers the area and power consumption is highly increased.

A TDA cost aware coefficient synthesis algorithm [18] which helps in the bisection of the tap delay and accumulate line is proposed with the help of a customized Genetic Algorithm [16]. As the probability of recurrent integer outputs increases after some number of accumulations, the bit width required to represent the output of a structural adder grows towards the output of the filter till the middle tap and remains relatively constant from the middle tap to the output. The probability of the output values of a fixed coefficient FIR filter can thus be approximated by a normal distribution. In fact, the tail values of the actual distribution are even less likely to occur than those at the tails of the normal distribution. This means that the range of structural adder outputs in the later half of

the TDA can be accurately estimated from the word length of the input variable and the predetermined filter coefficient values to reduce the lengths of the corresponding structural adders. The design shown in Figure 2 shows the technique used.





According to the method we consider a filter with the normalized passband frequency, stop band frequency, maximum passband ripple and minimum stopband attenuation. From the above considerations the initial finite precision set of word length and filter order N are determined with the help of Park-McIlean algorithm. All the considerations values and the determined values are implemented in the tap delay and accumulate minimum algorithm so as to design and optimize the tap delay and accumulate block.

# III. PROPOSED WORK

The design already proposed for tap delay and accumulate block is taken into consideration and the reversible logic is applied to the design. For example as a design example we have considered a low pass filter with passband frequency as 0.2Hz, stopband frequency as 0.3Hz, passband ripple as 0.325 dB and stopband attenuation as 27dB. The filter order is determined by the Parks-McIlean algorithm [17] to be 16. The design of the FIR filter is thus obtained by the tap delay and accumulate block optimized. Reversible logic is applied to this design. Figure 3 depicts the basic design of the modification.



Fig.3. TDA block with bisection technique using reversible logic

We used reversible logic gates due to its advantages in the design. Using reversible logic gates help to conserve



information and it has high energy efficiency. As the number of inputs and the number of outputs of the reversible logic gates same, power loss that generally happens due to bit erase operation which generally occurs can be eradicated.

#### A. Reversible Logic

There are many reversible gates out of which some are used in the design. Some of them are as follows.

# B. Feynman Gate

Feynman gate is a 2\*2 one through the reversible entryway as appeared in Figure 2A. The info vector is i (A, B) and along these lines yield vector is o (P, Q). The yields are sketched out by P = A, Q=A xor B. Feynman Gate (FG) can be utilized as a repeating gate. Since a fan-out is not permitted in reversible rationale, this gate is advantageous for duplication of the predefined yields. Figure 4 depicts the logic of the Feynman gates and its outputs.



Fig.4. Feynman Gate

# C. Fred kin Gate

It is a 3x3 reversible gate. It is a three input three output reversible gate with the representing inputs and outputs. Fred kin gate is a three input three output conservative reversible gate originally introduced by Petri. These kinds of logic circuits are used to conserve information. It will lead to improvement in energy efficiency of the circuit and increases the portability. Figure 5 shows the Fredkin gate and mentions its input and outputs.



# Fig.5. Fred kin gate

# D. Full adder design with help of Fred kin gates

A full adder can be designed with the help of four fredkin gates. The Register Transfer Level schematic of the design of full adder with the help of fredkin gates is shown in the figure. Similarly the other components present in the design such as half adders, registers etc are also optimized with the help of design implementation using reversible logic gates. The design of the full adder can be shown as a schematic in the below Figure 6.



Fig.6. Full adder designed using Fredkin gates

The FIR filter is realized and reproduced in verilog HDL language. The designs are simulated in the Modelsim 6.4c so as to compare whether the output obtained from the modified design matches with the output obtained from the existing design when the same input is taken. All the designs are mapped to Xilinx toolbox and implemented in FPGA Spartan 3 XC3S 200 TQ-144.The synthesized areas in number of LUTs and delays in ns are obtained. After the design is implemented, the NCD file output from Place & Route (PAR) is read by Xilinx ISE XPower Analyzer (XPA) to estimate the activity rates. The power dissipation results simulated by XPA are obtained.

# **IV. RESULTS**





Fig .7. RTL Schematic of the design

Figure 7 shows the RTL Schematic of the design which is Register transfer level view (viewed at gate level schematic). Figures8, 9 shows the simulation results of proposed and modified designs and checked whether the outputs are same for the provided same input.



B. Simulation Results of FIR Filter designed using TDA Block optimization



Fig.8. Simulation Results of FIR Filter designed using TDA Block optimization

C. Simulation Results of modified FIR Filter design using reversible logic



Fig.9. Simulation Results of modified FIR Filter design using reversible logic

Since the simulation results are compared successfully now the FIR filter is designed and implemented in Xilinx software so as to compare the hardware utilization and power consumption. Figures10, 11 shows the hardware implantation and Figure 12, 13 shows the power consumption of the existing and modified designs respectively. The results of area and power consumption obtained are compared in TABLE I.

#### D. Area utilized by existing design

|                                                | Device Utilization Summary |           |             |  |
|------------------------------------------------|----------------------------|-----------|-------------|--|
| Logic Utilization                              | Used                       | Available | Utilization |  |
| Number of Slice Flip Flops                     | 52                         | 3,840     | 1%          |  |
| Number of 4 input LUTs                         | 667                        | 3,840     | 17%         |  |
| Logic Distribution                             |                            |           |             |  |
| Number of occupied Slices                      | 388                        | 1,920     | 20%         |  |
| Number of Slices containing only related logic | 388                        | 388       | 100%        |  |
| Number of Slices containing unrelated logic    | 0                          | 388       | 0%          |  |
| Total Number of 4 input LUTs                   | 706                        | 3,840     | 18%         |  |
| Number used as logic                           | 667                        |           |             |  |
| Number used as a route-thru                    | 39                         |           |             |  |
| Number of bonded <u>IOBs</u>                   | 26                         | 97        | 26%         |  |
| Number of GCLKs                                | 1                          | 8         | 12%         |  |
| Total equivalent gate count for design         | 6,324                      |           |             |  |
| Additional JTAG gate count for IOBs            | 1,248                      |           |             |  |

# Fig.10. Area utilized by existing design

#### E. Area utilized by modified design

|                                                | Device Utilization Summary |           |             |  |  |
|------------------------------------------------|----------------------------|-----------|-------------|--|--|
| Logic Utilization                              | Used                       | Available | Utilization |  |  |
| Number of Slice Flip Flops                     | 1                          | 3.840     | 1%          |  |  |
| Number of 4 input LUTs                         | 140                        | 3,840     | 3%          |  |  |
| Logic Distribution                             |                            |           |             |  |  |
| Number of occupied Slices                      | 73                         | 1.920     | 3%          |  |  |
| Number of Slices containing only related logic | 73                         | 73        | 100%        |  |  |
| Number of Slices containing unrelated logic    | 0                          | 73        | 0%          |  |  |
| Total Number of 4 input LUTs                   | 140                        | 3,840     | 3%          |  |  |
| Number of bonded IOBs                          | 26                         | 97        | 26%         |  |  |
| Number of GCLKs                                | 1                          | 8         | 12%         |  |  |
| Total equivalent gate count for design         | 872                        |           |             |  |  |

#### Fig.11. Area utilized by modified design

#### F. Power consumption by existing design

| Power summary:                     | I(mA) | P(mW) |
|------------------------------------|-------|-------|
| Total estimated power consumption: |       | 1     |
|                                    |       |       |
| Vccint 1.20V:                      | 1     | 1     |
| Vccaux 2.50V:                      | 10    | 25    |
| Vcco25 2.50V:                      | 4418  | 11044 |
|                                    |       |       |
| Clocks:                            | 284   | 341   |
| Inputs:                            | 56    | 67    |
| Logic:                             | 654   | 784   |
| Outputs:                           |       |       |
| Veco25                             | 4418  | 11044 |
| Signals:                           | 661   | 793   |
|                                    |       |       |
| Quiescent Vccint 1.20V:            | 1     | 1     |
| Quiescent Vccaux 2.50V:            | 10    | 25    |

Fig.12. Power consumption by existing design

# G. Power consumption by modified design

| Power summary:                     | I(mA) | P(mW) |  |
|------------------------------------|-------|-------|--|
| Total estimated power consumption: |       | 4725  |  |
|                                    |       |       |  |
| Vccint 1.20V:                      | 651   | 782   |  |
| Vecaux 2.50V:                      | 10    | 25    |  |
| Vcco25 2.50V:                      | 1568  | 3919  |  |
|                                    |       |       |  |
| Clocks:                            | 183   | 220   |  |
| Inputs:                            | 56    | 67    |  |
| Logic:                             | 269   | 323   |  |
| Outputs:                           |       |       |  |
| Vcco25                             | 1568  | 3919  |  |
| Signals:                           | 26    | 31    |  |
|                                    |       |       |  |
| Quiescent Vccint 1.20V:            | 117   | 141   |  |
| Quiescent Vccaux 2.50V:            | 10    | 25    |  |

#### Fig.13. Power consumption by modified design

TABLE I COMPARIOSIN OF AREA UTILISWD AND POWER CONSUMED BETWEEN THE PROPOSED FIR FITER DESIGN AND MODIFIED FIR FILTER DESIGN

| Method       | Area |        | Power |         |       |
|--------------|------|--------|-------|---------|-------|
| Spartan 3    | LUTs | Slices | Gates | Current | Power |
|              |      |        |       | (mA)    | (mW)  |
| Proposed     | 667  | 388    | 6324  | 4418    | 11044 |
| Modification | 140  | 73     | 872   | 1568    | 3919  |
| V CONCLUSION |      |        |       |         |       |

# V. CONCLUSION

Motivated by the dominating cost of TDA block in FIR filter which cannot be effectively reduced by existing design algorithms that maximize sharing of adders in the MCM block, this paper modifies a new design methodology for area-power efficient FIR filter implementation by synthesizing the filter coefficients to specifically maximize the cost savings by applying Reversible logic gates in TDA block. The effective implementation cost of Reversible TDA block is determined based on operand bisection at some optimal tap position to reduce the sizes of subsequent structural adders and registers without violating the filtering specifications. From the above results tabulate in Table I which shows that the area of LUTs used is decreased from 18% to3% from the modified design and the power reductions are also visible from the same table. It can be said that the proposed solutions are efficient in reducing the hardware requirement and power consumption with good computation time of 40.416ns.

# REFERENCES

[1] H. Ma, L. Liu, A. Zhou, and D. Zhao, "On networking of Internet of Things: Explorations and challenges," IEEE Internet Things J., vol. 3, no. 4, pp. 441–452, Aug. 2016.

[2] J. Chen and C. H. Chang, "High-level synthesis algorithm for the design of reconfigurable constant multiplier," IEEE Trans. Compute.- Aided Design Integer., vol. 28, no. 12, pp. 1844–1856, Dec. 2009.

[3] T. Chen, Y. V. Zakharow, and C. Liu, "Low-complexity channel-estimate based adaptive linear equalizer," IEEE Signal Process. Lett., vol. 18, no. 7, pp. 427–430, Jul. 2011.

[4] M. R. Zahabi, V. Meghdadi, J. P. Cances, and A. Saemi, "Mixed signal matched filter for high-rate communication systems," IET Signal Process. vol. 2, no. 4, pp. 354–360, Dec. 2008.

[5] M. S. Prakash and R. A. Shaik, "Low-area and highthroughput architecture for an adaptive filter using distributed arithmetic," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60, no. 11, pp. 781–785, Nov. 2013.

[6] H. Mahdavifar, M. El-Khamy, J. Lee, and I. Kang, "Performance limits and practical decoding of interleaved Reed–Solomon polar concatenated codes," IEEE Trans. Commun., vol. 62, no. 5, pp. 1406–1417, May 2014.

[7] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, "A new algorithm for elimination of

common subexpressions," IEEETrans. Comput.-Aided Design Integr., vol. 18, no. 1, pp. 58 68, Jan. 1999.

[8] Y. J. Yu and Y. C. Lim, "Design of linear phase FIR filters in subexpression space using mixed integer linear programming," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 10, pp. 2330–2338,

Oct. 2007.

[9] R. Mahesh and A. P. Vinod, "New reconfigurable architectures for implementing FIR filters with low complexity," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 2, pp. 275–288, Feb. 2010.

[10] S. Hsiao, J. Z. Jian, and M. Chen, "Low-cost FIR filter designs based on faithfully rounded truncated multiple constant multiplication/ accumulation," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,

no. 5, pp. 287-291, May 2013.

[11] J. Chen, C. H. Chang, F. Feng, W. Ding, and J. Ding, "Novel design algorithm for low complexity programmable FIR filters based on extended double base number system," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 1, pp. 224–233, Jan. 2015.

[12] F. Feng, J. Chen, and C. H. Chang, "Hypergraph based minimum arborescence algorithm for the optimization and reoptimization of multiple constant multiplications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 2, pp. 233–244, Feb. 2016.

[13] J. Ding, J. Chen, and C. H. Chang, "A new paradigm of common subexpression elimination by unification of addition and subtraction," IEEE Trans. Comput. Aided Designs, vol. 35, no. 10, pp. 1605–1617, Oct. 2016.

[14] M. Faust and C. H. Chang, "Optimization of structural adders in fixed coefficient transposed direct form FIR filters," in Proc. IEEE Int. Symp. Circuits Syst., Taipei, Taiwan, May 2009, pp. 2185–2188.

[15] X. Lou, Y. Yu, and P. K. Meher, "Analysis and optimization of product accumulation section for efficient implementation of FIR filters," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 10, pp. 1701–1713, Oct. 2016.

[16] J. E. Beasley, Advances in Linear and Integer Programming. Oxford, U.K.: Oxford Science, 1996.

[17] J. H. McClellan and T. W. Parks, "A personal history of the Parks–McClellan algorithm," IEEE Signal Process. Mag., vol. 22, no. 2, pp. 82–86, Mar. 2005.

[18] Joajia Chen, Chip-Hong Chang, Jiatao Ding, Rui Qiao and Mathias Faust," Tap Delay-and-Accumulate Cost Aware Coefficient Synthesis Algorithm for the Design of Area-Power Efficient FIR Filters", IEEE Trans. ON CIRCUITS AND SYSTEMS–I,2017.