| Home | Research | Courses | Staff | Alumni |
|
Below you will find abstracs of a number of doctoral thesises defended in this research field at the CAS group. They are followed by other useful links related to our research. |
![]() Can you see the integrated circuit? |
On Power Consumption Issues in FIR Filters with Application to Communication Receivers: Complexity, Word length, and Switching Activity
Power consumption in CMOS VLSI circuits has in recent years become a major design constraint. This is in particular important for wireless networks, due to the limited life time of the batteries that wireless nodes are operating on.
Orthogonal Frequency Division Multiplexing (OFDM) is one example of a technique which in recent years has become widely applied in wireless communication systems. However, the performance of OFDM and other spectrally efficient schemes depends, to a large extend, on advanced digital signal processing (DSP) and on the use of efficient and possibly adaptive resource allocation and transmission techniques. These in turn require that accurate estimates of the channel are available in the receiver and transmitter.
However, accurate channel estimation of a time and frequency dispersive wireless fading channel calls for complex estimators, which might lead to significant power dissipation in such devices. Therefore, characterizing and analyzing power consumed by such devices under different channel conditions, and optimizing for power is important to reduce the overall power consumption of the system. In this thesis a certain chosen class of estimators, i.e., a linear FIR estimator, is considered, which is based on finite impulse response (FIR) filters. The work in this thesis considers the power related challenges in such estimators.
The power consumed by such estimators depends, in part, on the complexity of the estimator, i.e., the length of the FIR filter. The filter length is one of the factors affecting the estimation accuracy. An analysis of the relation between the performance of such estimators and the required complexity for these devices under different channel conditions, i.e., in the presence of noise, is performed in this thesis. In this study we show that a small increase in this noise can lead to a considerable increase in the required estimator complexity if a given Normalized Mean Square Error (NMSE) performance for the channel estimation must be upheld, in particular at medium-to-high Channel Signal to Noise Ratios (CSNR).
Furthermore, reducing the power consumption through word-length optimization, when realizing such estimators, is an attractive approach. Due to the characteristics of the input signal to such estimators, a special treatment of channel estimation error due to quantization of estimator filter coefficients is needed. In this thesis we investigate the impact of finite coefficient word length on channel estimator performance. A theoretical analysis of the increase in channel estimation error due to quantization of estimator coefficients is performed, and the behavior of this error in different fading environments and for different filter orders is studied.
The power consumed in a channel estimator is also influenced by the switching activity in the input signal of the estimator. Characterizing the switching activity in the input signal, including how this activity changes in different environments, e.g., in the presence of noise, is a subject of the work performed in this thesis. In this study we give an expression for direct calculation of the correlation coefficient for the most significant bit in a signal, using the word-level correlation coefficient. We also derive expressions for accurately calculating the variance (σ²) and word-level correlation coefficient (ρ) for a correlated signal, when an additional noise of a given variance is added to the signal. This can be used to estimate the bit-level switching activity in a signal in the presence of noise, based on the Dual Bit Type (DBT) method. The impact the additional noise has on the switching activity of a correlated signal has also been studied. These results make it possible for a designer to model the actual input switching activity in different real life noisy environments, enabling realistic power consumption estimation.
A study on switching activity reduction in estimator filters using a coefficient reordering method is another part of this thesis. Closed form analytical models for the coefficient and input data switching activity before and after reordering in an estimator filter is developed and the impact that coefficient reordering has on the input data, and consequently on the total switching activity, is studied. Using our derived models we show that the impact of coefficient reordering on data input increases first as the input signal correlation, ρz, increases, but this impact decreases again when ρz → 1. This impact is 0 for ρz ≈ 0 and ρz ≈ 1. Our results show that this impact is highest for ρz = 0.7 to ρz = 0.999, and becomes larger for large values of the estimator order N.
Considering a realistic case, we further study the possibility of reducing the switching activity in a MAC-based channel estimator when realized with different orders and word lengths, and operating in different environments. This study shows that if a designer makes the right choices when reordering, it can result in higher gain in reducing the switching activity. The decision will also depend on the channel condition in which the system is operating most of the time. The results of this study show that when the word length is reduced, the use of reordering can in some cases, e.g., when estimator order is increased to N = 50 and beyond, actually lead to an increase in total switching activity if extra care is not taken. It also shows that for large N and input data with medium to high correlation, it is not possible to reduce the switching activity using reordering if the word length is reduced to W = 8 or lower. When the word length is reduced the optimization in general becomes even more sensitive to the characteristics of the input data. The designer consequently need to have this information available to experience reduction or even avoid increase in switching activity for small values for W.
It should be mentioned that although we look at these power related challenges in the context of estimators, the results for several parts of this work is not limited to the channel estimators. The results concerning the switching activity reduction in MAC-based channel estimators can be generally applied to FIR filters, and the study on the input signal switching activity is valid for signals input to any digital signal processing (DSP) module.
[]On Power Consumption Issues in FIR Filters with Application to Communication Receivers: Complexity, Word length, and Switching Activity.
Doctoral thesis at NTNU, 2009:201, Asghar Havashki
Design of Low-Power Reduction-Trees in Parallel Multipliers
Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. Multipliers, being relatively complex units, are deciding factors to the overall speed, area, and power consumption of digital computers. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications. Traditionally, speed, area, and hardware resources have been the major design factors and concerns in digital design. However, the design paradigm shift over the past decade has entered dynamic power and static power into play as well.
In many situations, the overall performance of a system is decided by the speed of its multiplier. In this thesis, parallel multipliers are addressed because of their speed superiority. Parallel multipliers are combinational circuits and can be subject to any standard combinational logic optimization. However, the complex structure of the multipliers imposes a number of difficulties for the electronic design automation (EDA) tools, as they simply cannot consider the multipliers as a whole; i.e., EDA tools have to limit the optimizations to a small portion of the circuit and perform logic optimizations. On the other hand, multipliers are arithmetic circuits and considering arithmetic relations in the structure of multipliers can be extremely useful and can result in better optimization results. The different structures obtained using the different arithmetically equivalent solutions, have the same functionality but exhibit different temporal and physical behavior. The arithmetic equivalencies are used earlier mainly to optimize for area, speed and hardware resources.
In this thesis a design methodology is proposed for reducing dynamic and static power dissipation in parallel multiplier partial product reduction tree. Basically, using the information about the input pattern that is going to be applied to the multiplier (such as static probabilities and spatiotemporal correlations), the reduction tree is optimized. The optimization is obtained by selecting the power efficient configurations by searching among the permutations of partial products for each reduction stage. Probabilistic power estimation methods are introduced for leakage and dynamic power estimations. These estimations are used to lead the optimizers to minimum power consumption. Optimization methods, utilizing the arithmetic equivalencies in the partial product reduction trees, are proposed in order to reduce the dynamic power, static power, or total power which is a combination of dynamic and static power. The energy saving is achieved without any noticeable area or speed overhead compared to random reduction trees. The optimization algorithms are extended to include spatiotemporal correlations between primary inputs. As another extension to the optimization algorithms, the cost function is considered as a weighted sum of dynamic power and static power. This can be extended further to contain speed merits and interconnection power. Through a number of experiments the effectiveness of the optimization methods are shown. The average number of transitions obtained from simulation is reduced significantly (up to 35% in some cases) using the proposed optimizations.
The proposed methods are in general applicable on arbitrary multi-operand adder trees. As an example, the optimization is applied to the summation tree of a class of elementary function generators which is implemented using summation of weighted bit-products. Accurate transistor-level power estimations show up to 25% reduction in dynamic power compared to the original designs.
Power estimation is an important step of the optimization algorithm. A probabilistic gate-level power estimator is developed which uses a novel set of simple waveforms as its kernel. The transition density of each circuit node is estimated. This power estimator allows to utilize a global glitch filtering technique that can model the removal of glitches in more detail. It produces error free estimates for tree structured circuits. For circuits with reconvergent fanout, experimental results using the ISCAS’85 benchmarks show that this method generally provides significantly better estimates of the transition density compared to previous techniques.
[]Design of Low-Power Reduction-Trees in Parallel Multipliers.
Doctoral thesis at NTNU, 2008:61, Saeeid Tahmasbi Oskuii
Implementation of synthesis filter bank for subband coding of images
A uniform filter bank structure is developed which retains the high coding gain of subband coders while having a complexity close to that of the discrete cosine transform (DCT). Reduced complexity is obtained by replacing the six upper channels by an 8 point DCT. By using longer filters in the two lower channels, blocking effects which is disturbing artifacts in transform coders, are eliminated.
The filter bank is required to handle HDTV sample rates at a minimum area cost and acceptable power consumption. Different algorithms and architectures for the filter bank structure are evaluated in order to satisfy these requirements. Through simulations and analysis the optimal wordlengths of coefficients and internal signals are found. The memory requirements between vertical and horisontal filtering is minimized so that a two-dimensional filter bank can be implemented on one chip. The DCT-part has been processed in 0.8 um CMOS technology and functional chips received. The one-dimensional filter bank will be processed shortly.
[]Analysis and VLSI design of synthesis filter bank for image subband coding. Ph.D. Thesis, Fys.El.-rapport 1997:33, Ingil Sundsbø
VLSI solutions for speech recognition
This work concentrates on high speed digital signal processing in CMOS both in general, and on applications in continous speakerindependent large vocabulary speech recognition.
Specifically, design automation with the TSPC and CDPD circuit techniques have been studied and methods for this developed. A general standard cell library in 0.8um CMOS suitable for synthesis of DSP algorithms have been developed and tested on fabricated test designs with satisfactory results. The library contains TSPC flipflops capable of a maximum clock frequency of 700 MHz, as well as a 1ns matching full adder primitive, and is fully compatible with commercial logic synthesis tools.
Two design examples demonstrate the performance improvements possible with the proposed library and design approach. First, a third order wave digital filter implementation with a typical sample rate of 300MHz which is an improvement with a factor of more than two compared to previous work on the same filter and in the same process.
Secondly, a pdf (probability density function) co-processor for speech recognition capable of performing 160 million subtract-square-multiply-accumulate operations per second which is comparable to the performance of a Cray super computer on the same problem.
[]Design Automation of High Speed Digital Signal Processing in VLSI with Applications in Speech Recognition Systems Based on Hidden Markov Models. Ph.D. Thesis, Fys.El.-rapport 1996:36, Johnny Pihl;
On VLSI Realization of a Low-Power Audio Coder with Low System Delay
This thesis is a contribution to low-power, low-voltage realization of digital signal processing algorithms in VLSI. The discussions are on algorithms, architectures and circuit level designs of a proposed audio encoder in a wireless digital microphone.
At the algorithmic level, a cosine-modulated filter bank has been compared to a parallel FIR filter bank, showing a complexity of only 1/5 in terms of arithmetic operations. By signal flow graph transformations, we have identified a suitable processing element, the X-PE. This X-PE has been shown to be efficiently realized by distributed arithmetic.
At the circuit level, 5 different full adder candidates have been compared, by full custom cell design, fabrication and measurement of 5 separate test chips. The best full adder, the SRPL2, is 50 % faster , but needs only 45 % of the power compared to a standard cell design at 2.4 V supply voltage. A double-edge triggered D-type flip-flop, the SRPL-DETFF, has been proposed, based on the SRPL technique. Simulation and test chip measurements have shown that the SRPL-DETFF is twice as fast, while the energy per operation is between 47 % and 80 %, as compared to a standard CMOS flip-flop below 2V. A bit-serial adder has been proposed based on these findings, requiring only 25-50 % of the energy per bit-operation, while exhibiting higher performance than a standard cell solution.
The simple delay model proposed by Hu has been exploited, exhibiting very close correspondence to detailed simulations and test chip measurements for supply voltages from 5 V to 1.1 V. The delay variations due to temperature and process variations have been successfully included. Sensitivity analysis and simulations suggest very conservative design margins with respect to timing at low supply voltages.
At the architectural level, we have proposed a bit-serial system architecture based on three distributed arithmetic X-PEs and two bit-serial processing elements. We have made evident that the proposed solution will safely operate from a single 1.2 V rechargeable battery cell, yielding a power consumption scaling of 1/17. Compared to a 5V realization of the FIR contestant, a power reduction by 5 x 17 = 85 can be obtained. The estimate of the power consumption for the filter bank is only 0.46 mW, clearly demonstrating that the proposed audio encoder may be realized, with no significant power disadvantage.
[]On VLSI Realization of a Low-Power Audio Coder with Low System Delay. Ph.D. Thesis, Fys.El.-rapport 1996:08, Tormod Njølstad
High Speed Cell Library in CMOS for Bit-Serial Implementation of DSP Algorithms
We have been working with a versatile new cell library developed for a maximum clock frequency of 640MHz@5V/0.8um. The cell library is targeted towards a hardwired bit-serial design style featuring an automatic standard cell based layout approach for implementation of DSP functional modules and circuits.
The high speed is obtained by employing an enhanced version of the True Single Phase Clock circuit technique. The selection of bit-serial operators is motivated by an optional link to existing commercial synthesis tools for mapping a behavioural algorithmic description into a bit-serial architecture at the register transfer level.
Four dedicated circuits have been fabricated in a 0.8um CMOS process to demonstrate functionality, performance and applicability. Test results confirm correct operation well above the target frequency of 640 MHz@5V.
[]A High Speed Cell Library in CMOS for Bit-Serial Implementation of DSP Algorithms. Ph.D. Thesis, Fys.El.-rapport 1996:05, Jan Egil Øye
Useful links
Arithmetic Module Generator
Research efforts in structures for high performance multipliers and adders with short design time have resulted in an Arithmetic Module Generator. The Module Generator was originally coded by a Master's student, Espen Sand, for whom Johnny Pihl (see thesis abstract above) was the supervisor. The Generator is capable of generating structural VHDL and Verilog code for fast multipliers, adders and subtractors of arbitrary word length. It also features a large number of structural options, especially for multipliers.

























