|
|
|||
|
|
|||
|
se forøvrig instituttets emneoversikt |
Research
Electro-optics group -- Preview
CAS Research - Design for Testability
| Home | Research | Courses | Staff | Alumni |
|
When designing a chip, it is not only essential that the model and prototype is functionally correct, it must also be possible to efficiently test the circuit for faults during production. In this field, our group is currently focusing on Built-in-Self-Test and detection of path delay faults. |
![]() Can you TEST the integrated circuit? |
BEST PAPER AWARD AT EWDTS TO MANIKANDAN PALANICHAMY
Research Fellow Manikandan Palanichamy received the best regular paper award at the East - West Design & Test Symposium in September 2011. The prize was given for the paper ''A Programmable BIST with Macro and Micro codes for Embedded SRAMs'' with co-authors Bjørn B. Larsen, Einar J. Aas, and Areef Mohammad. Congratulations to all.
THE EUROPEAN TEST SYMPOSIUM (ETS) 2011 IN TRONDHEIM
ETS, the largest event in Europe that is entirely devoted to presenting and discussing trends, emerging results, hot topics, and practical applications in the area of electronic-based circuit and system testing, was held in Trondheim, Norway, on May 23-27, 2011.

Professor Einar Johan Aas acted as General Chair, while Erik Larsson, University of Linköping, was Program Chair. With the help of, among others, Associate Professor Bjørn B. Larsen and a number of our PhD-students, they organized this major event.
The technical program consisted of two plenary keynote addresses, “The Truths and Myths of Embedded Computing”, Shekhar Borkar, Intel Corporation, USA, and “From custom design to high volume design and manufacture – shift in major validation and test challenges”, Frank Berntsen, Nordic Semiconductor, Norway. Furthermore, the program included 42 technical paper presentations, 14 vendor presentations, four embedded tutorials, three poster sessions with 18 poster papers, three panels, a special session featuring a student contest, and student work-in-progress. The full program can be found at: http://www.iet.ntnu.no/workshop/ets2011/. All papers will be available through IEEE Xplore, and some papers will appear in JETTA.
Prior to ETS, Paolo Prinetto and Hans-Joachim Wunderlich organized the Test Spring School (TSS). TSS was held on May 20-23 with lectures from Shekhar Borkar, Peter Maxwell, Rob Aitken, Bernd Becker, Said Hamdioui, and Sybille Hellebrand.
And after ETS, Peter Harrod arranged the following three workshops on May 27-28; Dependability Issues in Deep-submicron Technologies (DDT), IEEE International Workshop on Processor Verification, Test and Debug (IWPVTD’11), 4th IEEE International Workshop on Impact of Low-Power design on Test and Reliability (LPonTR’11).
Altogether, ETS has expanded into a whole test week, with TSS and workshops at each end. The number of participants was approximately 200.
CAS Research - VLSI for Digital Signal Processing
| Home | Research | Courses | Staff | Alumni |
|
Digital Signal Processing is perhaps the most important enabling technology behind the last few decades' communication and multi-media revolutions. VLSI hardware realization of architectures and applications in this field is vital to enable high performance, low chip area, and/or low power consumption. Through the years, the CAS-group has performed research to optimize all of these, in recent years having low power as the main objective.
The research has been performed in close cooperation with the signal processing group of our department, e.g., through Norwegian Research Council projects such as CUBAN (Co-optimized Ubiquitous Broadband Access Networks) and CROPS (CRoss-layer OPtimization in Short-range wireless sensor networks). We also cooperate regularly with the Electronics Systems division at Department of Electrical Engineering, University of Linköping, Sweden. |
![]() Can you see the integrated circuit? |
Below you will find abstracts of a number of doctoral thesises defended in this research field at the CAS group. They are followed by other useful links related to our research.
On Power Consumption Issues in FIR Filters with Application to Communication Receivers: Complexity, Word length, and Switching Activity
Power consumption in CMOS VLSI circuits has in recent years become a major design constraint. This is in particular important for wireless networks, due to the limited life time of the batteries that wireless nodes are operating on.
Orthogonal Frequency Division Multiplexing (OFDM) is one example of a technique which in recent years has become widely applied in wireless communication systems. However, the performance of OFDM and other spectrally efficient schemes depends, to a large extend, on advanced digital signal processing (DSP) and on the use of efficient and possibly adaptive resource allocation and transmission techniques. These in turn require that accurate estimates of the channel are available in the receiver and transmitter.
However, accurate channel estimation of a time and frequency dispersive wireless fading channel calls for complex estimators, which might lead to significant power dissipation in such devices. Therefore, characterizing and analyzing power consumed by such devices under different channel conditions, and optimizing for power is important to reduce the overall power consumption of the system. In this thesis a certain chosen class of estimators, i.e., a linear FIR estimator, is considered, which is based on finite impulse response (FIR) filters. The work in this thesis considers the power related challenges in such estimators.
The power consumed by such estimators depends, in part, on the complexity of the estimator, i.e., the length of the FIR filter. The filter length is one of the factors affecting the estimation accuracy. An analysis of the relation between the performance of such estimators and the required complexity for these devices under different channel conditions, i.e., in the presence of noise, is performed in this thesis. In this study we show that a small increase in this noise can lead to a considerable increase in the required estimator complexity if a given Normalized Mean Square Error (NMSE) performance for the channel estimation must be upheld, in particular at medium-to-high Channel Signal to Noise Ratios (CSNR).
Furthermore, reducing the power consumption through word-length optimization, when realizing such estimators, is an attractive approach. Due to the characteristics of the input signal to such estimators, a special treatment of channel estimation error due to quantization of estimator filter coefficients is needed. In this thesis we investigate the impact of finite coefficient word length on channel estimator performance. A theoretical analysis of the increase in channel estimation error due to quantization of estimator coefficients is performed, and the behavior of this error in different fading environments and for different filter orders is studied.
The power consumed in a channel estimator is also influenced by the switching activity in the input signal of the estimator. Characterizing the switching activity in the input signal, including how this activity changes in different environments, e.g., in the presence of noise, is a subject of the work performed in this thesis. In this study we give an expression for direct calculation of the correlation coefficient for the most significant bit in a signal, using the word-level correlation coefficient. We also derive expressions for accurately calculating the variance (σ²) and word-level correlation coefficient (ρ) for a correlated signal, when an additional noise of a given variance is added to the signal. This can be used to estimate the bit-level switching activity in a signal in the presence of noise, based on the Dual Bit Type (DBT) method. The impact the additional noise has on the switching activity of a correlated signal has also been studied. These results make it possible for a designer to model the actual input switching activity in different real life noisy environments, enabling realistic power consumption estimation.
A study on switching activity reduction in estimator filters using a coefficient reordering method is another part of this thesis. Closed form analytical models for the coefficient and input data switching activity before and after reordering in an estimator filter is developed and the impact that coefficient reordering has on the input data, and consequently on the total switching activity, is studied. Using our derived models we show that the impact of coefficient reordering on data input increases first as the input signal correlation, ρz, increases, but this impact decreases again when ρz → 1. This impact is 0 for ρz ≈ 0 and ρz ≈ 1. Our results show that this impact is highest for ρz = 0.7 to ρz = 0.999, and becomes larger for large values of the estimator order N.
Considering a realistic case, we further study the possibility of reducing the switching activity in a MAC-based channel estimator when realized with different orders and word lengths, and operating in different environments. This study shows that if a designer makes the right choices when reordering, it can result in higher gain in reducing the switching activity. The decision will also depend on the channel condition in which the system is operating most of the time. The results of this study show that when the word length is reduced, the use of reordering can in some cases, e.g., when estimator order is increased to N = 50 and beyond, actually lead to an increase in total switching activity if extra care is not taken. It also shows that for large N and input data with medium to high correlation, it is not possible to reduce the switching activity using reordering if the word length is reduced to W = 8 or lower. When the word length is reduced the optimization in general becomes even more sensitive to the characteristics of the input data. The designer consequently need to have this information available to experience reduction or even avoid increase in switching activity for small values for W.
It should be mentioned that although we look at these power related challenges in the context of estimators, the results for several parts of this work is not limited to the channel estimators. The results concerning the switching activity reduction in MAC-based channel estimators can be generally applied to FIR filters, and the study on the input signal switching activity is valid for signals input to any digital signal processing (DSP) module.
[]On Power Consumption Issues in FIR Filters with Application to Communication Receivers: Complexity, Word length, and Switching Activity.
Doctoral thesis at NTNU, 2009:201, Asghar Havashki
Design of Low-Power Reduction-Trees in Parallel Multipliers
Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. Multipliers, being relatively complex units, are deciding factors to the overall speed, area, and power consumption of digital computers. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications. Traditionally, speed, area, and hardware resources have been the major design factors and concerns in digital design. However, the design paradigm shift over the past decade has entered dynamic power and static power into play as well.
In many situations, the overall performance of a system is decided by the speed of its multiplier. In this thesis, parallel multipliers are addressed because of their speed superiority. Parallel multipliers are combinational circuits and can be subject to any standard combinational logic optimization. However, the complex structure of the multipliers imposes a number of difficulties for the electronic design automation (EDA) tools, as they simply cannot consider the multipliers as a whole; i.e., EDA tools have to limit the optimizations to a small portion of the circuit and perform logic optimizations. On the other hand, multipliers are arithmetic circuits and considering arithmetic relations in the structure of multipliers can be extremely useful and can result in better optimization results. The different structures obtained using the different arithmetically equivalent solutions, have the same functionality but exhibit different temporal and physical behavior. The arithmetic equivalencies are used earlier mainly to optimize for area, speed and hardware resources.
In this thesis a design methodology is proposed for reducing dynamic and static power dissipation in parallel multiplier partial product reduction tree. Basically, using the information about the input pattern that is going to be applied to the multiplier (such as static probabilities and spatiotemporal correlations), the reduction tree is optimized. The optimization is obtained by selecting the power efficient configurations by searching among the permutations of partial products for each reduction stage. Probabilistic power estimation methods are introduced for leakage and dynamic power estimations. These estimations are used to lead the optimizers to minimum power consumption. Optimization methods, utilizing the arithmetic equivalencies in the partial product reduction trees, are proposed in order to reduce the dynamic power, static power, or total power which is a combination of dynamic and static power. The energy saving is achieved without any noticeable area or speed overhead compared to random reduction trees. The optimization algorithms are extended to include spatiotemporal correlations between primary inputs. As another extension to the optimization algorithms, the cost function is considered as a weighted sum of dynamic power and static power. This can be extended further to contain speed merits and interconnection power. Through a number of experiments the effectiveness of the optimization methods are shown. The average number of transitions obtained from simulation is reduced significantly (up to 35% in some cases) using the proposed optimizations.
The proposed methods are in general applicable on arbitrary multi-operand adder trees. As an example, the optimization is applied to the summation tree of a class of elementary function generators which is implemented using summation of weighted bit-products. Accurate transistor-level power estimations show up to 25% reduction in dynamic power compared to the original designs.
Power estimation is an important step of the optimization algorithm. A probabilistic gate-level power estimator is developed which uses a novel set of simple waveforms as its kernel. The transition density of each circuit node is estimated. This power estimator allows to utilize a global glitch filtering technique that can model the removal of glitches in more detail. It produces error free estimates for tree structured circuits. For circuits with reconvergent fanout, experimental results using the ISCAS’85 benchmarks show that this method generally provides significantly better estimates of the transition density compared to previous techniques.
[]Design of Low-Power Reduction-Trees in Parallel Multipliers.
Doctoral thesis at NTNU, 2008:61, Saeeid Tahmasbi Oskuii
Implementation of synthesis filter bank for subband coding of images
A uniform filter bank structure is developed which retains the high coding gain of subband coders while having a complexity close to that of the discrete cosine transform (DCT). Reduced complexity is obtained by replacing the six upper channels by an 8 point DCT. By using longer filters in the two lower channels, blocking effects which is disturbing artifacts in transform coders, are eliminated.
The filter bank is required to handle HDTV sample rates at a minimum area cost and acceptable power consumption. Different algorithms and architectures for the filter bank structure are evaluated in order to satisfy these requirements. Through simulations and analysis the optimal wordlengths of coefficients and internal signals are found. The memory requirements between vertical and horisontal filtering is minimized so that a two-dimensional filter bank can be implemented on one chip. The DCT-part has been processed in 0.8 um CMOS technology and functional chips received. The one-dimensional filter bank will be processed shortly.
[]Analysis and VLSI design of synthesis filter bank for image subband coding. Ph.D. Thesis, Fys.El.-rapport 1997:33, Ingil Sundsbø
VLSI solutions for speech recognition
This work concentrates on high speed digital signal processing in CMOS both in general, and on applications in continous speakerindependent large vocabulary speech recognition.
Specifically, design automation with the TSPC and CDPD circuit techniques have been studied and methods for this developed. A general standard cell library in 0.8um CMOS suitable for synthesis of DSP algorithms have been developed and tested on fabricated test designs with satisfactory results. The library contains TSPC flipflops capable of a maximum clock frequency of 700 MHz, as well as a 1ns matching full adder primitive, and is fully compatible with commercial logic synthesis tools.
Two design examples demonstrate the performance improvements possible with the proposed library and design approach. First, a third order wave digital filter implementation with a typical sample rate of 300MHz which is an improvement with a factor of more than two compared to previous work on the same filter and in the same process.
Secondly, a pdf (probability density function) co-processor for speech recognition capable of performing 160 million subtract-square-multiply-accumulate operations per second which is comparable to the performance of a Cray super computer on the same problem.
[]Design Automation of High Speed Digital Signal Processing in VLSI with Applications in Speech Recognition Systems Based on Hidden Markov Models. Ph.D. Thesis, Fys.El.-rapport 1996:36, Johnny Pihl;
On VLSI Realization of a Low-Power Audio Coder with Low System Delay
This thesis is a contribution to low-power, low-voltage realization of digital signal processing algorithms in VLSI. The discussions are on algorithms, architectures and circuit level designs of a proposed audio encoder in a wireless digital microphone.
At the algorithmic level, a cosine-modulated filter bank has been compared to a parallel FIR filter bank, showing a complexity of only 1/5 in terms of arithmetic operations. By signal flow graph transformations, we have identified a suitable processing element, the X-PE. This X-PE has been shown to be efficiently realized by distributed arithmetic.
At the circuit level, 5 different full adder candidates have been compared, by full custom cell design, fabrication and measurement of 5 separate test chips. The best full adder, the SRPL2, is 50 % faster , but needs only 45 % of the power compared to a standard cell design at 2.4 V supply voltage. A double-edge triggered D-type flip-flop, the SRPL-DETFF, has been proposed, based on the SRPL technique. Simulation and test chip measurements have shown that the SRPL-DETFF is twice as fast, while the energy per operation is between 47 % and 80 %, as compared to a standard CMOS flip-flop below 2V. A bit-serial adder has been proposed based on these findings, requiring only 25-50 % of the energy per bit-operation, while exhibiting higher performance than a standard cell solution.
The simple delay model proposed by Hu has been exploited, exhibiting very close correspondence to detailed simulations and test chip measurements for supply voltages from 5 V to 1.1 V. The delay variations due to temperature and process variations have been successfully included. Sensitivity analysis and simulations suggest very conservative design margins with respect to timing at low supply voltages.
At the architectural level, we have proposed a bit-serial system architecture based on three distributed arithmetic X-PEs and two bit-serial processing elements. We have made evident that the proposed solution will safely operate from a single 1.2 V rechargeable battery cell, yielding a power consumption scaling of 1/17. Compared to a 5V realization of the FIR contestant, a power reduction by 5 x 17 = 85 can be obtained. The estimate of the power consumption for the filter bank is only 0.46 mW, clearly demonstrating that the proposed audio encoder may be realized, with no significant power disadvantage.
[]On VLSI Realization of a Low-Power Audio Coder with Low System Delay. Ph.D. Thesis, Fys.El.-rapport 1996:08, Tormod Njølstad
High Speed Cell Library in CMOS for Bit-Serial Implementation of DSP Algorithms
We have been working with a versatile new cell library developed for a maximum clock frequency of 640MHz@5V/0.8um. The cell library is targeted towards a hardwired bit-serial design style featuring an automatic standard cell based layout approach for implementation of DSP functional modules and circuits.
The high speed is obtained by employing an enhanced version of the True Single Phase Clock circuit technique. The selection of bit-serial operators is motivated by an optional link to existing commercial synthesis tools for mapping a behavioural algorithmic description into a bit-serial architecture at the register transfer level.
Four dedicated circuits have been fabricated in a 0.8um CMOS process to demonstrate functionality, performance and applicability. Test results confirm correct operation well above the target frequency of 640 MHz@5V.
[]A High Speed Cell Library in CMOS for Bit-Serial Implementation of DSP Algorithms. Ph.D. Thesis, Fys.El.-rapport 1996:05, Jan Egil Øye
Useful links
Arithmetic Module Generator
Research efforts in structures for high performance multipliers and adders with short design time have resulted in an Arithmetic Module Generator. The Module Generator was originally coded by a Master's student, Espen Sand, for whom Johnny Pihl (see thesis abstract above) was the supervisor. The Generator is capable of generating structural VHDL and Verilog code for fast multipliers, adders and subtractors of arbitrary word length. It also features a large number of structural options, especially for multipliers.
CAS Research - Hardware/Software Codesign and System Level Design
| Home | Research | Courses | Staff | Alumni |
|
Modern electronic systems grow in complexity, and are combinations of different types of components: microprocessors, signal processors, RAM/ROM, digital logic and analogue circuits. At the same time, designers meet demands for shorter development time and lower development cost. This incites the use of higher abstraction levels, so-called system level design. System level design requires system level tools that simultaneously handle both hardware and software, for modeling, partitioning, verification and synthesis of the complete system. Hardware/Software Codesign covers all of these topics, and as such draws on the experience from many different technological communities. |
![]() Can you see the integrated circuit? |
Data Transfer and Storage Exploration
Many integrated circuit systems, particularly in the multi-media and telecom domains, are inherently data dominant. For this class of applications, data transfer and storage largely determine cost and performance parameters. This is the case for chip size, since large memories are usually needed, performance, since accessing the memories may very well be the main bottleneck, and power consumption, since the memories and buses consume large quantities of energy. Even for systems with caches, the overall storage requirement has vital impact on the performance and power consumption, since it greatly influences the number of slow and power expensive cache misses. For the system development process, the designer must hence concentrate first on exploring the data transfer and storage to achieve a cost optimized end product.
Multi-processor system-on-chip
The next generation of embedded systems will be dominated by industrial and consumer devices, which are able to deliver communications and rich, scalable multimedia content anytime, anywhere. These wireless communication and multimedia applications (e.g. 3D games, advanced medical observation and supervision systems) will lead to the creation of extremely complex and dynamic code with huge resource requirements. Such systems can typically not be handled by a single micro-processor. Instead, a System on Chip with multiple processing elements, an MPSoC, is required. The processing elements will be heterogeneous in nature, and can be anything from a standard processor to more specialized or even field programmable hardware units. The key characteristics of the applications will be the intensive computation, the large data transfer and storage requirement, and the need for efficient resource management. In such systems it is essential to optimize the utilization of computational resources both statically at design time, and dynamically at run-time.
We are cooperated closely with imec in Leuven, Belgium and Eindhoven, The Netherlands, regarding research in this domain. We are also member of The European Network of Excellence on High Performance and Embedded Architecture and Compilation,
. More information about the topic and the results of our work can be found at the following places:
- Home Page of Professor Per Gunnar Kjeldsberg
- Home Page of PhD student Elena Hammari
- imec Home Page
- HiPEAC Home Page
Below you will find abstracs of a number of doctoral thesises defended in this research field at the CAS group.
Hierarchical Memory Size Estimation for Loop Transformation and Data Memory Platform Exploration
In today's embedded systems, the memory hierarchy is rapidly becoming a major bottleneck in terms of power, performance and area, due to the very large amount of (memory related) data need to be transferred and stored (temporarily). This is especially the case for portable multi-media applications systems. These applications are characterized by deep loop nests and multi-dimensional arrays at the high level. Due to the dramatically increasing size and complexity of system-on-a-chip (SoC) designs and stringent time-to-market requirement, the methodology and tools for chip design must be raised to the system level. Early analysis tools are particularly critical in enabling SoC designers to take full advantage of the many architectural options available. For memory optimization, the early high level techniques aim either to design an optimal memory platform for a given application or to optimize the application code in order to take advantage of the memory platform features, or even both. Loop transformation is such an important high level optimization technique. It modifies the execution order of loops and statements without changing the application functionality. Existing loop transformation algorithms are all performed based either on reduction of data access lifetime and on improvement in data locality and regularity to steer selection of loop transformations. These are, however, very abstract cost functions which do not represent the exact memory size requirement of the arrays and how the data will be mapped onto the memory platform later on. Existing algorithms all result in one final loop transformation solution. As different loop transformations may result in optimal utilization for different memory platform instances, ad-hoc decisions at this stage without estimating their impact on the actual hierarchy utilization can lead to a final sub-optimal solution. An evaluation of later design stages' effort is hence required. On the other hand, there usually exist a huge number of loop transformation possibilities, the estimation is required to be performed repeatedly and its computation time of the estimation technique also becomes critical to make it useful during the loop transformation search space exploration.
This dissertation proposes a memory footprint estimation methodology. An intra-array memory footprint estimation is performed first followed by an inter-array estimation. In order to achieve a fast estimate to make it useful repeatedly during the early high level search space exploration, several techniques have been introduced. A fast intra-array memory footprint estimation is performed at the iteration domain based on the maximal lifetime of data accesses, which is defined by the maximal dependency vector. Two approaches, an ILP formulation and vertexes approach, have been introduced for achieving a fast maximal dependency vector calculation. The fast inter-array estimation has been achieved based on several Hanoi tower based approaches.
A hierarchical memory size estimation methodology has also been proposed in this dissertation. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high level design stage, a platform independent estimation is introduced with a Pareto curve output for each loop transformation instance. It can steer the designer or an automatic steering tool to select all the interesting loop transformation instances that might later lead to low power data mapping for any of the many possible memory hierarchy instances. This is useful when the memory platform is not defined yet, or for a given memory hierarchy instance. It also allows to find the most appropriate low power memory hierarchy instance by performing an early power estimation of different memory hierarchy instances. Initially the source code is used as input for estimation, resulting in an initial approach. However, performing the estimation repeatedly from the source code is too slow for the large loop transformation search space exploration. An incremental approach, based on local updating of the previous result, is thus introduced to handle sequences of different loop transformations. Several advanced techniques have also been used on these two approaches in order to perform a fast estimation, such as bounding box geometrical model based data reuse analysis, platform independent memory hierarchy layer assignment estimation, fast intra- and inter-array memory footprint estimation.
The feasibility and usefulness of the methodologies are substantiated using several representative real-life application demonstrators. It shows for instance that the fast memory footprint estimation can be two order of magnitude faster than compared techniques while still achieving fairly accurate estimation result. For hierarchical memory size estimation methodology, the initial approach is two order of magnitude faster than the compared technique and the incremental approach is another two order of magnitude faster than the initial approach, which can just take a few milliseconds. The fast computation time of the incremental approach make it feasible to be used repeatedly during the loop transformation exploration over a very large number of possibilities. Furthermore, prototype CAD tools has been developed that includes mast parts of the methodologies.
[]Hierarchical Memory Size Estimation for Loop Transformation and Data Memory Platform Exploration. Doctoral thesis at NTNU, 2007:63, Qubo Hu
Storage Requirement Estimation and Optimization for Data Intensive Applications
Many integrated circuit systems, especially in the multi-media and telecom domains, are inherently data dominant. For this class of applications, data transfer and storage largely determine cost and performance parameters. This is the case for chip size, since large memories are usually needed. It is however also the situation for performance. Accessing the memories is in more and more cases the main bottleneck and the cache behavior is for a significant part determined by “capacity misses” which are related to size issues. Finally, it is the case for power consumption, since the memories and buses consume large quantities of energy and the “effective data size” influences their capacitive loading in case of RAMs and their miss-related activity in case of caches. In both cases, the power consumption per access is also heavily affected. During the system development process, the designer must hence concentrate first on exploring the data transfer and storage to produce a cost-optimized end product. At the system level, no detailed information is available regarding the size of the memories required for storing data in the alternative realizations of an application. To guide the designer and help in selecting the best solution, estimation techniques for the storage requirements are needed, very early in the system design trajectory. The simplest estimates use the maximal size of the intermediate array data as declared in the application code. This is however not representative for the effective size required for their storage during the actual execution since arrays and parts of arrays may not be alive simultaneously. An array element is alive from the moment it is written, or produced, and until it is read for the last time. This last read is said to consume the element. To achieve accurate estimates, the so-called in-place mapping opportunity generated by these non-overlapping lifetimes must be taken into account. For scalars, a relatively simple lifetime analysis suffices, but for arrays, this is extremely complex due to the huge number of signals and the often very complex interdependencies between them.
For our target classes of data dominant applications the high-level description is typically characterized by large multi-dimensional nested loops and arrays. Within the loop nests statements access the arrays using read and write operations. At the beginning of the design process, no information about the execution order of these loops is available, except what is given from the data dependencies between the statements in the code. As the process progresses, the designer makes decisions that gradually fix the ordering, until the full execution ordering is known. This execution ordering determines the order in which array elements are accessed and hence the lifetimes of the array elements. Since these lifetimes in turn influence the in-place mapping opportunity, the storage requirements of the arrays within the loop nests is largely determined by the execution ordering. To guide the designer it is therefore essential to have storage requirement estimates that can take the available partially fixed execution ordering into account during the exploration of the implementation solution space. Previous work has either not taken execution ordering into account at all, resulting in large overestimates, or required a fully specified ordering. In the last case, a time consuming full exploration of all possible alternative orderings of the unfixed loop dimensions is needed. This is infeasible during the early system design steps where fast feedback is needed to be able to explore the huge solution space.
The storage requirement estimation methodology proposed in this doctoral thesis solves these important design problems. The methodology is divided into four steps. In the first step, a data-flow graph is generated that reflects the data dependencies in the application code. The array accesses and the dependencies between them are described using a polyhedral model. The second step places the polyhedral descriptions of the array accesses and their dependencies in a so-called common iteration space. A worst-case and best-case placement may be performed, both taking available execution ordering into account during the placement. The third step estimates the upper and lower bounds on the storage requirement of individual data dependencies in the code, taking into account the available execution ordering. As the execution ordering is gradually fixed, the upper and lower bounds on the data dependencies converge. This is a very useful and unique property of the methodology. Finally, simultaneously alive data dependencies are detected. Their combined maximal size at any time during execution equals the total storage requirement of the application. An important part of the estimation technique utilizes loop ordering guidance to estimate upper and lower bounds on dependency sizes. These guiding principles and the proof of their validity are together with the general estimation methodology important contributions of this thesis. The guiding principles can be used for high-level synthesis independently of the storage requirement estimation methodology.
The feasibility and usefulness of the methodology are substantiated using several representative application demonstrators. It is for instance shown how the designer is guided into reducing the memory size of the major arrays in the MPEG-4 Motion Estimation Kernel from 262400 to 257 memory locations. Similar results are achieved for a Cavity Detection algorithm. Applying the methodology on an Updating Singular Value Decomposition algorithm, it is also demonstrated how estimation feedback during global loop reorganization can approximately halve the application's storage requirement. Furthermore, a prototype CAD tool has been developed that includes major parts of the storage requirement estimation and optimization methodology. Using manually generated design examples the tool proved the feasibility of the techniques and in particular showed that run times on computers will be short, in the order of seconds even for substantial applications.
[]Storage Requirement Estimation and Optimization for Data Intensive Applications. Fys.el.-rapport:2001:24, Per Gunnar Kjeldsberg
Best Poster Award for Jawad Rasool
We congratulate PhD-student Jawad Rasool for receiving "Best Poster Award" at the VERDIKT Conference in Oslo.
Title: “Improving the Throughput Guarantees Offered in Wireless Networks.”
Authors: Jawad Rasool and Geir E. Øien
Conference: The VERDIKT Conference 2010
Time and place: November 1-2, 2010, in Oslo
About the Poster Presentation:
With real-time traffic transmitted over wireless networks, the need for more exact quality of service (QoS) measures is in the interests of both network operators and customers. A QoS measure well suited to quantify QoS guarantees exactly is a throughput guarantee, i.e., how many bits a user is guaranteed to transmit/receive within a time-window. This poster presentation focuses on two main issues. 1) How can we quantify the throughput guarantees for a certain scheduling algorithm without conducting experimental investigations? 2) How can we maximize the throughput guarantees offered in a wireless network?
About the VERDIKT Conference:
VERDIKT's vision is that Norwegian ICT research will put Norway at the forefront of the development and application of knowledge to enhance interaction, innovation and value creation in the ICT-based network community. The VERDIKT conference serves as a meeting place where researchers from academia and industry can present their projects and results, and meet new colleagues.
33rd Scandinavian Symposium on Physical Acoustics

Geilo Hotel, 7–10 February 2010
Purpose
The purpose of these meetings is primarily to stimulate contacts and exchange information between different Scandinavian teams working in this area. Although the symposium is Scandinavian, foreign participants are most welcome, and the meeting language will be English. As usual, we expect a rather informal tone, the main goal being to create contacts, not only during sessions, but also by social activities, indoors and outdoors (cross country and downhill skiing).
These yearly meetings are arranged by the Norwegian Physical Society, and was this year organized by Professor Ulf Kristiansen from NTNU. This year's symposium consisted of 54 participants from six different countries, giving a total of 28 presentations over three days. A full list of the participants can be downloaded at the bottom of this page.
Participants at the conference. (The image is spliced from two original photos which can be found here.)
Program / Proceedings
All talks held during the five sessions of the conference are listed below. Articles which have been submitted for the conference proceedings can be found by following the links in the program.
Monday 8 February, morning session
08:30 Małgorzata Godlewska1, Atle Rustadbakken2, Thrond Haugen2, Lech Doroszczyk1, Bronisław Długoszewski1; 1Stanisław Sakowicz Inland Fisheries Institute, 2Norwegian Institute for Water Research
What we can learn from hydroacoustics about ecosystem quality - an example of WEL catchments
08:50 Helge Balk, Torfinn Lindem, Jan Kubecka; University of Oslo
Detection, tracking and sizing of fish using data
from DIDSON multibeam sonars
09:10 John Gavin Macaulay; Institute of Marine Research
Species identification of large deepwater aggregations – an example from two New Zealand seamounts
09:30 Coffee break
10:00 Halvor Hobæk, Anja Heggen; University of Bergen
On the correction of depth dependence for normal incidence echo sounders
10:20 Andreas Austeng; University of Oslo
Adaptive beamforming in active sonar imaging
10:40 Fabrice Prieur, Peter Näsholm, Sverre Holm, Andreas Austeng; University of Oslo
Non-linear propagation, and use of harmonics in sonar imaging
Monday 8 February, afternoon session
16:00 Marianne Solberg, Anders Groth Helland, Kjell Tonning, Tecwel AS
Ultrasound used in well diagnostics
16:20 Stian Stavland; Christian Michelsen Research AS
Acoustic Pipe Deposit Detection
16:40 Lanbo Liu1, 2, Hefeng Dong2, Guosong Zhang2, Zhengliang Cao3, 2, Peter Gerstoft4, Jens Hovem5; 1University of Connecticut, 2Norwegian University of Science and Technology, 3Hangzhou Applied Acoustics Research Institute, 4Scripps Institue of Oceanography, 5SINTEF
Characterization of seafloor sediments with a passive vertical array
17:00 Coffee break
17:30 Hefeng Dong1, Ross Chapman2;1Norwegian University of Science and Technology, 2University of Victoria
Inversion of seismic velocities from seabed reflection data
17:50 Ganpan Ke; Norwegian University of Science and Technology
Surface wave modelling in layered transversely isotropic medium
18:10 Sverre Holm; University of Oslo
A unifying fractional wave equation for compressional and shear waves
Tuesday 9 February, morning session
08:30 Erlend Magnus Viggen; Norwegian University of Science and Technology
The Lattice Boltzmann Method in Acoustics
08:50 Gunnar Taraldsen; SINTEF
The ABC of the Delany-Bazley model
09:10 Zhengliang Cao1, 2, Hefeng Dong2, Lanbo Liu2, 3, Guosong Zhang2; 1Hangzhou Applied Acoustics Research Institute, 2Norwegian University of Science and Technology, 3University of Connecticut
Study of scattering from two spheres: modelling and experiment
09:30 Coffee break
10:00 Kestutis Staliunas; Polytechnic University of Catalonia
Sound beam shaping by sonic crystals
10:20 Victor J. Sanchez-Morcillo, Pedro Alonso, Juan A. Martínez-Mora, Isabel Pérez-Arjona, Victor Espinosa; Polytechnic University of Valencia
Spatio-temporal dynamics of sound beams in acoustic resonators
10:40 Sven Ivansson; Swedish Defence Research Agency
Characterizing phononic crystal slabs with nearly optimal properties
Tuesday 9 February, afternoon session
16:00 Jan Kocbach1, Remi Kippersund1, Per Lunde2; 1Christian Michelsen Research, 2University of Bergen
Time domain finite element modeling of SH-mode propagation in elastic plate deposits
16:20 Magne Aanes1, Jan Kocbach2, Magne Vestrheim1; 1University of Bergen, 2Christian Michelsen Research AS
Modal and direct harmonic solution methods in FE modeling of piezoceramic disks
16:40 Espen Storheim, Magne Aanes, Magne Westrheim, Per Lunde; University of Bergen
Ultrasonic piezoceramic transducers for air,- finite element analysis and measurements
17:00 Coffee break
17:30 Tonni Johansen; Norwegian University of Science and Technology
Characterization of ultrasound transducers
17:50 Tung Manh; Vestfold University College
Design of a high frequency ultrasound transducer using silicon micromachining
18:10 Torstein Olsmo Sæbø; Norwegian Defence Research Establishment
Pipeline inspection with synthetic aperture sonar
Wednesday 10 February, morning session
09:00 Ulf Kristiansen1, Trond Jenserud2, Marie Darrieus1,3; 1Norwegian University of Science and Technology, 2Norwegian Defence Research Establishment, 3ENSTA (ParisTech)
Sound propagation in regions of range dependent oceanography
09:20 Trond Jenserud, Paul van Walree; Norwegian Defence Research Establishment
Characterization of acoustic channels in Oslofjorden
09:40 Coffee break
10:10 Guosong Zhang, Hefeng Dong, Jens Hovem; Norwegian University of Science and Technology
Underwater acoustic communication experiments in the Trondheim fjord
10:30 Karl Thomas Hjelmervik; Norwegian Defence Research Establishment
Target depth estimation using ray backpropagation on mid-frequency active sonar data
Articles
All articles in the proceedings can be downloaded individually through the links above, or together in this .zip archive.
The oxide electronics lab - Publications 2009
| Research | Group members | Facilities | Collaborations | Publications |
Publications 2009
Structure and properties of multiferroic oxygen hyperstoichiometric BiFe1-xMnxO3+δ,
M. Selbach, T. Tybell, M.-A. Einarsrud, and T. Grande, Chem. Mater. 21, 5176 (2009)
Polarization direction and stability in ferroelectric lead titanate thin films,
Ø. Dahl, J.K. Grepstad, and T. Tybell, J. Appl. Phys. 106, 084104 (2009)
High-temperature semiconducting cubic phase of BiFe0.7Mn0.3O3+δ,
S.M. Selbach, T. Tybell, M.-A. Einarsrud, and T. Grande, Phys. Rev. B 79, 214113 (2009)
Epilayer control of photodeposited materials during UV photocatalysis,
R. Takahashi, M. Katayama, Ø. Dahl, J. K. Grepstad, Y. Matsumoto, and T. Tybell, Appl. Phys. Lett. 94, 232901 (2009)
The fabrication and chracterization of PbTiO3 nanomesas realized on nanostructured SrRuO3/SrTiO3 templates,
C.C.You, R. Takahashi, A. Borg, J.K. Grepstad and T. Tybell, Nanotechnology 20, 255705 (2009)
Sputter-deposited (Pb,La) (Zr,Ti)O3 thin films: Effect of substrate and optical properties,
Ø. Nordseth, T. Tybell, J. K. Grepstad, and A. Røyset, J. Vac. Sci. and Tech. 27, 548 (2009)
Epitaxial (Pb,La)(Zr,Ti)O3 thin films on buffered Si(100) by on-axis rf magnetron sputtering,
Ø. Nordseth, T. Tybell, and J.K. Grepstad, Thin Solid Films 517, 2623 (2009)
The oxide electronics lab - Publications 2008
| Research | Group members | Facilities | Collaborations | Publications |
Publications 2008
Comparison of TEM specimen preparation of perovskite thin films by tripod polishing and conventional ion milling,
E. Eberg, Å.F. Monsen, T. Tybell, A.T.J. van Helvoort and R. Holmestad, Journal of electron microscopy 57 , 6, 175-179 (2008)
The Ferroic Phase Transitions of BiFeO3,
S.M.Selbach, T. Tybell, M.-A. Einarsrud and T. Grande, Adv. Mat. 20, 3692 (2008)
Ferroelectric stripe domains in PbTiO3 thin films: Depolarization field and domain randomness,
R. Takahashi, Ø. Dahl, E. Eberg, J.K. Grepstad and T. Tybell, J. Appl. Phys. 104, 064109 (2008)
Crystalline and dielectric properties of sputter deposited PbTiO3 thin films,
Ø. Dahl, J.K. Grepstad and T. Tybell, J. Appl. Phys. 103, 114112 (2008)
PbTiO3 nanorod arrays grown by self-assembly of nanocrystals,
P.M. Rørvik, Å. Almli, A.T.J. van Helvoort, R. Holmestad, T. Tybell, T. Grande and M. Einarsrud, Nanotechnology 19, 225605 (2008)
Photochemical switching of ultrathin PbTiO3 films,
R. Takahashi, J.K. Grepstad, T. Tybell and Y. Matsumoto, Appl. Phys. Lett. 92, 112901 (2008)
The oxide electronics lab - Publications 2007
| Research | Group members | Facilities | Collaborations | Publications |
Publications 2007
Size dependent properties of nanocrystalline BiFeO3 particles,
M. Selbach, T. Tybell, M.-A. Einarsrud, and T. Grande, Chem. Mater. 19, 6478 (2007)
Synthesis of BiFeO3 by wet chemical methods,
M. Selbach, M.-A. Einarsrud, T. Tybell and T. Grande, J. Am. Ceram. Soc., 90, 3430 (2007)
Nanoscale structuring of SrRuO3 thin film surfaces by scanning tunneling microscopy,
C.C. You, N-V. Rystad, A. Borg and T. Tybell, Applied Surface Science 253, 4704 (2007)
Formation and electronic properties of oxygen annealed Au/Ni and Pt/Ni contacts to p-type GaN,
S. V. Pettersen, A. P. Grande, T. Tybell, H. Riechert, R. Averbeck and J. K. Grepstad, Semicond. Sci. Technol., 22, 186-193 (2007)
The oxide electronics lab - Publications 2006
| Research | Group members | Facilities | Collaborations | Publications |
Publications 2006
Nanoscale studies of domain wall motion in epitaxial ferroelectric thin films,
P. Paruch, T. Giamarchi, T. Tybell and J.-M. Triscone, J. App. Phys., 100, 051608 (2006)
Thickness-dependent properties of (110)-oriented La1.2Sr1.8Mn2O7thin films,
Y. Takamura, R.V. Chopdekar, J. Grepstad, Y. Suzuki, A.F. Marshall, A. Vailionis, H. Zheng, and J.F. Mitchell, J. of Appl. Phys. 99, 085902 (2006)
