Nonlinear Image Processing
Nonlinear Image Processing
Academic Press Series in Communications, Networking, and Multimedia
EDITORINCHIEF
Jerry D. Gibson Southern Methodist University
This series has been established to bring together a variety of publications that represent the latest in cuttingedge research, theory, and applications of modern communication systems. All traditional and modern aspects of communications as well as all methods of computer communications are to be included. The series will include professional handbooks, books on communication methods and standards, and research books for engineers and managers in the worldwide communications industry.
Books in the Series: Handbook of Image and Video Processing, A1 Bovik, editor The ECommerce Book, Steffano Korper and Juanita Ellis Multimedia Communications, Jerry Gibson, editor Nonlinear Image Processing, Sanjit K. Mitra and Giovanni L. Sicuranza, editors
Nonlinear I m a g e Pr OCe S Si n g EDITORS
SANJIT K. MITRA University of California Santa Barbara, California, USA
GIOVANNI L. SICURANZA University of Trieste Trieste, Italy
ACADEMIC PRESS A Harcourt Science and Technology Company
SAN DIEGO/ SAN FRANCISCO/ NEW YORK/ BOSTON/ LONDON / SYDNEY/TOKYO
This book is printed on acidfree paper. Copyright 9 2001 by Academic Press All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Requests for permission to make copies of any part of the work should be mailed to the following address: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando, Florida, 328876777. ACADEMIC PRESS A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, CA 921014495, USA http://www.academicpress.com Academic Press Harcourt Place, 32 Jamestown Road, London, NWl 7BY, UK Library of Congress Catalog Number: 00104376 ISBN: 0125004516 Printed in the United States of America 000102030405HP987654321
Contents
Preface
ix
Analysis and Optimization of Weighted Order Statistic and Stack Filters 1 S. Peltonen, P. Kuosmanen, K. Egiazarian, M. Gabbouj, and J. Astola
1.1 1.2 1.3 1.4 1.5
Introduction 1 Median and Order Statistic Filters Stack Filters 2 Image Processing Applications 21 Summary 22
Image Enhancement and Analysis with Weighted Medians G. Arce and J. Paredes
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
Introduction 27 Weighted Median Smoothers and Filters 28 Image Denoising 45 Image Zooming 49 Image Sharpening 52 Optimal Frequency Selection WM Filtering 58 Edge Detection 62 Conclusion 65
SpatialRank Order Selection Filters
69
K. Barner and R. Hardie
3.1 3.2
Introduction 69 Selection Filters and SpatialRank Ordering
72
27
Vi
CONTENTS
3.3 3.4 3.5 3.6
4
SpatialRank Order Selection Filters Optimization 94 Applications 96 Future Directions 105
81
SignalDependent RankOrderedMean (SDROM) Filter E. Abreu 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
5
Introduction 111 Impulse Noise Model 112 Definitions 113 The SDROM Filter 114 Generalized SDROM Method 116 Experimental Results 121 Restoration of Images Corrupted by Streaks Concluding Remarks 131
126
Nonlinear Mean Filters and Their Applications in Image Filtering and Edge Detection 135 C. Kotropoulos, M. Pappas, and I. Pitas 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
6
Introduction 135 Nonlinear Mean Filters 136 SignalDependent Noise Filtering by Nonlinear Means 140 Edge Detectors Based on Nonlinear Means 141 Grayscale Morphology Using s Mean Filters 142 Ultrasonic Image Processing Using s Mean Filters 147 Sorting Networks Using s Mean Comparators 157 Edge Preserving Filtering by Combining Nonlinear Means and Order Statistics 160 Summary 163
TwoDimensional Teager Filters
167
S. Thurnhofer
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
Introduction 167 Discrete Volterra Series and Properties 167 Interpretation of Frequency Responses 171 The Teager Algorithm and OneDimensional Extensions Spectrum of the Output Signal 177 MeanWeighted Highpass Filters 179 LeastSquares Design of Edge Extracting Filters 185 Summary 194 Appendix 195
172
111
CONTENTS
vii
Polynomial and Rational Operators for Image Processing and Analysis 203 G. Ramponi
7.1 7.2 7.3 7.4 7.5
8
Introduction 203 Theoretical Survey of Polynomial and Rational Filters Applications of Polynomial Filters 208 Applications of Rational Filters 214 Conclusions and Remaining Issues 221
204
Nonlinear Partial Differential Equations in Image Processing 225 G. Sapiro
8.1 8.2 8.3
9
Introduction 225 Segmentation of Scalar and Multivalued Images 228 Nonlinear PDEs in General Manifolds: Harmonic Maps and Direction Diffusion 235
RegionBased Filtering of Images and Video Sequences: 249 A Morphological Viewpoint P. Salembier
9.1 9.2 9.3 9.4 9.5 9.6
Introduction 249 Classical Filtering Approaches 251 Connected Operators 254 Connected Operators Based on Reconstruction Processes 256 Connected Operators Based on RegionTree Pruning 264 Conclusions 283
10 Differential Morphology
289
P. Maragos
10.1 10.2 10.3 10.4 10.5 10.6 10.7
Introduction 289 2D Morphological Systems and Slope Transforms PDEs for Morphological Image Analysis 300 Curve Evolution 308 Distance Transforms 310 Eikonal PDE and Distance Propagation 318 Conclusions 323
294
11 Coordinate Logic Filters: Theory and Applications in Image Analysis 331 B. Mertzios and K. Tsirikolias
11.1
Introduction
331
V~
CONTENTS
11.2 11.3 11.4 11.5 11.6 11.7
Coordinate Logic Operations on Digital Signals 333 Derivation of the Coordinate Logic Filters 337 Properties of Coordinate Logic Filters 339 Morphological Filtering Using Coordinate Logic Operations on Quantized Images 340 Image Analysis and Pattern Recognition Applications Concluding Remarks 352
12 Nonlinear Filters Based on Fuzzy Models
342
355
F. Russo
12.1 12.2 12.3 12.4 12.5 12.6
Introduction 355 FuzzyModels 356 Fuzzy Weighted Mean (FWM) Filters 359 FIRE Filters 363 Evolutionary Neural Fuzzy Filters: A Case Study Concluding Remarks and Future Trends 372
13 Digital Halftoning
366
375
D. Lau and G. Arce
13.1 13.2 13.3 13.4 13.5
Introduction 375 Halftone Statistics 381 BlueNoise Dithering 385 GreenNoise Dithering 390 Conclusions 398
14 Intrinsic Dimensionality: Nonlinear Image Operators and HigherOrder Statistics 403 C. Zetzsche and G. Krieger
14.1 14.2 14.3 14.4 14.5 14.6
Index
Introduction 403 Transdisciplinary Relevance of Imrinsic Dimensionality i2DSelective Nonlinear Operators 413 Frequency Design Methods for i2D Operators 421 i2D Operators and HigherOrder Statistics 432 Discussion 437
449
406
Preface
In recent years, nonlinear methods and techniques have emerged as intensive research topics in the fields of signal and image processing. The reason for this increased interest in nonlinear methods of image processing is mainly due to the following observations. First, the human visual system (HVS) includes some nonlinear effects that need to be considered in order to develop effective image processing algorithms. Therefore, to comply with the characteristics of the HVS and, thus, obtain better visual results, nonlinear algorithms are necessary. Moreover, the nonlinear behavior of optical imaging systems and their related image formation systems must be taken into account. Finally, images are signals that in general do not satisfy the widely used hypotheses of Gaussianity and stationarity that are usually assumed to validate linear models and filtering techniques. In this respect, it is well known, for example, that linear filters are not able to remove an impulsive noise superimposed on an image without blurring its edges and small details. Other situations in which linear filters perform poorly are those cases where signaldependent or multiplicative noises are present in the images. Although linear filters continue to play an important role in signal processing because they are inherently simple to implement, the advances of computers and digital signal processors, in terms of speed, size, and cost, make the implementation of more sophisticated algorithms practical and effective. These considerations are the basis for the increased interest in the development of new nonlinear techniques for image processing, with particular emphasis on the applications that benefit greatly from a nonlinear approach, such as edge preserving smoothing, edge enhancement, noise filtering, image segmentation, and feature extraction. An interesting aspect of the recent studies on nonlinear image processing is the fact that an attempt has been made to organize the previously scattered contributions in a few homogeneous sectors. While a common framework is far from being derived (or it is simply out of reach since nonlinearity is defined as the lack
ix
X
PREFACE
of a property, that is, linearity), suitable classes of nonlinear operators have been introduced. A (not exhaustive) list of these classes includes: 9 Homomorphic filters, relying on a generalized superposition principle; 9 Nonlinear mean filters, using nonlinear definitions of means; 9 Morphological filters, based on geometrical rather than analytical properties; 9 Order statistics filters, based on ordering properties of the input samples; 9 Polynomial filters, using polynomial expressions in the input and output samples; 9 Fuzzy filters, applying fuzzy reasoning to model the uncertainty that is typical of some image processing issues; and 9 Nonlinear operators modeled in terms of nonlinear partial differential equations (PDEs). All of these filter families are considered in this book, but with a different stress according to their popularity and impact on image processing tasks. Another relevant aspect that constitutes at present a trend in the area of nonlinear filters is the search for relationships and hierarchies among the abovementioned classes. For example, interrelations have been pointed out between order statistic filters and PDE models; and between these two classes and morphological filters, some forms of polynomial filters can be expressed as PDEs, etc. Moreover, the generalization efforts permit the wellknown filters to be considered members of broader classes. In this respect, homomorphic filters can be viewed as belonging to the class of nonlinear mean filters, the ubiquitous median filter can be described as an element of more general categories of order statistic nonlinear filters, and so on. Such aspects are considered in the appropriate chapters of this book and the relevant interrelations and hierarchies are referenced and illustrated. Finally, an emerging research line in the field of nonlinear filtering is the joint exploitation of different information and features typical of different filter classes. The aim of the approaches based on this methodology is clearly to exploit the advantages offered by the various classes of nonlinear operators while reducing their drawbacks. This result can be achieved by combining different information to define new filter classes, as shown for example in some of the contributions contained in this book by the joint use of spatial and rank ordering information. An alternative approach, especially useful for the solution of welldefined image processing tasks, is based on the successive use of different kinds of filters depending on the specific application considered. In our opinion the material presented in this book aids this goal and thus permits the realizations of actual applicationoriented algorithms and systems.
PREFACE
Xi
The first three chapters of the book deal with variations of order statistic filters. Chapter I introduces stack and weighted order statistics filters. The interrelations between different kinds of filters that can be viewed as cases of the general threshold decomposition and Boolean logic treatment are shown. The most important tools available for the analysis and optimization of these filters are presented. Both deterministic and statistical properties are considered in order to give a comprehensive understanding of the reasons why these filters work so well in certain applications. Chapter 2 deals with an extended class of nonlinear filters derived from the median operator, that is, the local weighted median filter. After reviewing the principles of weighted medians, smoothers allowing positive as well as negative weights are introduced. These nonlinear tools are applied to image enhancement and analysis, with specific applications to image denoising and sharpening, zooming, and edge detection. Methods for designing optimal frequencyselective weighted median filters are also described. Chapter 3 explores the joint use of spatial and rank ordering information on the input samples in the framework of the socalled selection filters. In such filters, spatial ordering is used to exploit correlations between neighboring samples while rank order is used to isolate outliers and ensure robust behavior. The chapter theoretically motivates selection filters and develops several class subsets and extensions that utilize partial/full/extended spatial and rank ordering information. The developed filters are applied to various image processing tasks, such as noise smoothing, interpolation, and image restoration. Chapter 4 contains another example of a combination of different kinds of operations to derive new sets of nonlinear filters. In fact, the signaldependent rankorderedmean filters exploit the rank ordering properties of the input sampies together with operations such as mean and differences acting on the input samples ordered by rank. In particular, the rankordered differences provide information about the likelihood of corruption for the current pixel. The resulting nonlinear algorithms are particularly efficient to remove impulse noises from highly corrupted images while preserving details and features. Nonlinear mean filters, described in Chapter 5, can be considered as another alternative to median filters and their extension to remove impulse noises effectively, especially when the impulses occur with a high probability. They have a very simple structure and thus are suitable for realtime processing applications. From a statistical point of view, they rely on the nonlinear means that are wellknown location estimators. This approach produces a general filter structure that encompasses homomorphic, order statistics, and morphological filters. Effective edge detectors and edge preserving filters are demonstrated, together with soft grayscale morphological filters, which are shown to be useful for removal of both Rayleigh and signaldependent Gaussian speckle noises that usually affect ultrasonic images.
xii
PREFACE
Chapters 6 and 7 deal with polynomial filters and their application to image processing tasks. The interest in such filters is mainly due to the fact that they can be considered to be the most natural extension of linear operators. In Chapter 6, the Teager filter is described in the framework of the most general class of quadratic Volterra filters. This filter has the property that sinusoidal inputs generate constant outputs that are approximately proportional to the square of the input frequency. Its properties are presented and appropriate twodimensional versions are derived. Efficient design techniques are proposed and applications in image enhancement are demonstrated. Chapter 7 provides an overview of polynomial filters based on the discrete Volterra series and of their extensions to two dimensions. Then, rational filters are introduced. These nonlinear filters, whose inputoutput relationship is given in the form of a ratio of two polynomials with respect to the input samples, are universal approximators, as polynomial functions, but they can achieve the desired level of accuracy with lower complexity and better extrapolation capabilities. Applications of polynomial filters for contrast enhancement, texture segmentation, and edge extraction are considered. Applications of rational filters to detailpreserving noise smoothing and interpolation with accurate edge reproduction are presented, together with a contrast enhancement technique that provides results comparable to those obtained with the previously described polynomial technique. Using partial differential equations (PDEs) and curve/surface flows leads to model images in a continuous domain. The understanding of discrete local nonlinear filters is facilitated when one lets the grid mesh tend to zero and thus rewrites the discrete filter, thanks to an asymptotic expansion, as a partial differential operator. An advantage of such an approach is the possibility of achieving high accuracy and stability according to the extensive available research on numerical analysis. This emerging research area is considered in Chapter 8. After a general presentation, the first part of the chapter deals with the use of PDEs for image segmentation, while the second part discusses the use of PDEs to process multivalued data defined on nonflat manifolds, for example, directional data. In Chapter 9 the basic concepts of morphological filtering and the corresponding operators are introduced and described by examples. Then, the basic notions related to a recent set of morphological filtering tools, called connected operators, are presented. Connected operators are essentially regionbased filtering tools since they do not modify individual pixel values, but instead act directly on the connected components of the space where the image is constant. The two most successful strategies to define connected operators, based on reconstruction processes and tree representations, are discussed. The interest for morphological regionbased tools is related to the recent developments in the new area of multimedia applications and services, where contentbased compression and indexing of image and video signals are typical examples of situations where new modeling strategies are necessary. Differential morphology is the topic presented in Chapter 10. Morphological image processing has traditionally been based on modeling images as sets or as
PREFACE
xiii
points in a complete lattice of functions and viewing morphological image transformations as set or lattice operations. In parallel, there is a recently growing part of morphological image processing that is based on ideas from differential calculus and dynamic systems. Therefore, the unifying theme that defines the differential morphology is a collection of nonlinear differential/difference equations modeling the scale or space dynamics of morphological systems. In this chapter a unified view of the various interrelated ideas in this area is presented. Some system analysis tools in both space and transform domains are developed. Moreover, the connections between nonlinear PDEs and multiscale morphological filtering are fully discussed. Chapter 11 presents the fundamental definitions and properties of the coordinate logic filters that constitute a tool for processing graylevel images as a set of binary images. In fact, these filters are coincident with morphological filters for binary images, while maintaining a similar functionality for graylevel images. The remarkable advantage of coordinate logic filters is that their simplicity allows for very fast implementations because sorting operations are not required. Typical applications for image enhancement and analysis are presented. Another relevant property of these filters is their direct relation with fractal structures. In the last part of the chapter, examples and simple rules for designing fractal forms and cellular automata are given. Since its introduction in 1965 as a mathematical tool able to model the concept of partial membership, the theory of fuzzy sets has been used in many fields of engineering. Recently, this approach has been extended to cover also image processing applications because fuzzy filters are well suited to address the uncertainty that typically occurs when opposite needs have to be guaranteed, for example, noise cancellation and detail preservation. After a brief introduction to fuzzy models, the principal families of nonlinear filters based on fuzzy systems are described in detail in Chapter 12. Both indirect approaches, which typically adopt the basic structure of a weighted mean filter and use fuzzy models to evaluate the corresponding weights, and direct approaches, which adopt special fuzzy systems for directly yielding the output values, are presented. Relevam applications to noise removal are also shown. Chapter 13 deals with digital halftoning, that is, the procedure that allows the reproduction of original continuoustone photographs with binary patterns. After a brief review of the major techniques used in the past, current solutions for high print resolution and accurate color reproduction are described in detail. Several metrics for the characterization of stochastic dither patterns in both the spatial and spectral domain are introduced. Two approaches, based on bluenoise and greennoise models, are discussed. The bluenoise model, which is just the highfrequency component of a white noise, constitutes at present the basis of techniques widely applied in the printing industry. In contrast, the greennoise model, which is essentially the midfrequency component of a white noise, represents a new approach to stochastic halftoning that provides higher resolutions.
XiV
PREFACE
Finally, in Chapter 14 the concept of intrinsic dimensionality is introduced as a relevant property of images. In fact, most local areas of natural images are nearly constant, and thus are classified as intrinsic zerodimensional structures, while some other areas, such as straight lines and edges, are intrinsically one dimensional and only a minority of zones are intrinsically two dimensional, such as junctions and corners. Chapter 14 shows that, while the separation of intrinsic zerodimensional signals from other signals requires only some kind of linear filtering and a subsequent threshold operation, the selective processing of intrinsic twodimensional signals requires specific Volterra operators. The derivation of the necessary and sufficient conditions for the definition of suitable quadratic operators is provided, together with actual examples of different types of such operators. Some further extensions related to higher order statistics and an analysis of the relations of local intrinsic dimensionality to basic neurophysiological and psychophysical aspects of biological image processing conclude the chapter. As might be clear from the discussion of its contents, our objective in editing this book has been to present both an overview of the state of the art and an exposition of some recent advances in the area of nonlinear image processing. We have attempted to present a comprehensive description of the most relevant classes of nonlinear filters, even though some other contributions could have been inserted. An example of these contributions is the evolutionary and learningbased nonlinear operators, including models that exploit training methods and algorithms based on machine learning paradigms, often copied from biological structures, such as neural networks and intelligent agents. In consideration of the vast number of contributions in this area that are still evolving, and with interest mainly oriented toward the methodologies rather than specific image processing tasks, our decision was not to include a report on these approaches in this book. According to the choice of topics included here and the style of their presentation, this book is suitable, in our opinion, both as an introductory text to nonlinear image processing and as an updating report for a few specific, advanced areas. In fact, tutorial aspects have been preferred in some chapters or in some sections, while more specific techniques and applications have been considered in other chapters or sections. For this reason parts of the book can be usefully adopted as textbook material for graduate studies, whereas other parts can be used as an uptodate reference for practicing engineers. We believe the field of nonlinear image processing has matured sufficiently to justify bringing out a more uptodate book on the subject. Because of the diversity of topics in the field, it would be difficult for one or two authors to write such a book. This is the reason for publishing an edited book. We would like to point out that the authors contributing chapters to this book are leading experts in their respective fields. We would also like to express to each of them our gratitude for their timely contributions of highquality texts. We have made every attempt to ensure the accuracy of all materials in this book. However, we would very much appreciate readers bringing to our attention any errors that may have appeared in the book due to reasons beyond our
PREFACE
XV
control and that of the publisher. These errors and any other commems can be communicated to either of us by email addressed to m i t r a @ e c e . u c s b . e d u or si cu ranzaOgnbts, u n i v . t r i e s t e , i t. We thank Dr. Jayanta Mukhopadhyay of the Indian Institute of Technology, Kharagpur, India, for his critical review of all chapters. We also thank Patricia Monohon for her assistance in the preparation of the E~TEXfiles of this book.
SANJIT K. MITRA GIOVANNI L. SICURANZA
This Page Intentionally Left Blank
Analysis and Optimization of Weig hted Order Statistic and Stack Filters SARI PELTONEN, PAULI KUOSMANEN, KAREN EGIAZARIAN, MONCEF GABBOUJ, AND JAAKKO ASTOLA Department of Information Technology Tampere University of Technology Tampere, Finland
1.1
Introduction
In this chapter we consider stack and weighted order statistic filters and the most important tools available for their analysis and optimization. Both deterministic and statistical properties are covered to give a comprehensive understanding of why these filters work so well in certain applications.
1.2
Median and Order Statistic Filters
The median filter was introduced in the 1970s by Tukey under the name "running median" for smoothing of discrete data [Tuk74]. Since median filters attenuate impulsive noise effectively and preserve signal edges well, these filters have been studied and used widely in the field of signal processing (e.g., [Ast97]). Edge preservation is especially essential in image processing due to the nature of visual perception.
2
SARI PELTONEN et al.
The references in this chapter follow the literature published in English but the filters considered also were studied extensively at the same time in the former Soviet Union (see [Gil76] and references therein). At time instant n let the samples X ( n  k), X ( n  k + 1 ) , . . . , X ( n + k) be in the filter window. For simplicity we denote these samples by Xi = X ( n  k), X2 = X ( n  k + 1 ) , . . . , XN = X (n + k). Let X( 1 ), X ( 2 ) , 9 9 9 X(N) be the samples in increasing order; we call element X(t) the tth order statistic. Now, the output of the median filter is the sample X(k+i), i.e., the middle sample. If instead of the median sample the tth order statistic of the values inside the window is chosen to be the output of the filter, the filter is called the tth order statistic filter or ranked order filter. The median filter, although offering clear advantages over linear filters, also has its shortcomings, such as streaking [Bov87b], edge jittering [Bov87a], and loss of small details from the images [Arc89, Nie87]. The main reason for the loss of details is that the median filter uses only rank order information, discarding the temporal order (spatial order) information. Thus, a natural extension of the median filter is a weighted median (WM) filter, where more emphasis can be given to the samples that are assumed to be more reliable, i.e., the samples near the center sample of the window. The filtering procedure is similar to the one for median filtering with the exception that the samples are duplicated to the number of the corresponding weight. The WM filters are considered in the next chapter and have been surveyed thoroughly elsewhere [Yli91, Yin96]. In the same way that the median filter has a weighted version, the order statistic filter also has one, called the weighted order statistic (WOS) filter. We have adopted the notation of YliHarja et al. [Yli91], where the weights, separated by commas, and the threshold, separated by a semicolon, are listed in angle brackets, that is, <wx, w2, ..., WN; T). The output Y(x) of this WOS filter with input vector x = (Xi, X2,..., XN) is given by Y(x) = Tth largest value of multiset { w i . X x ,
~2"X2,
...,
[email protected]},
where 9 denotes the repetition (duplication) operation, that is, r ~ x = x, x , . . . , x. r times
The recursive version of the median filter is defined as Y(x) = MEDIAN{Yi, Y2,..., Yk,Xk+x,Xk+2,... ,XN}, where Yi, Y2,..., Yk are outputs already computed. The recursive counterpart of every filter can be obtained in the same way.
1.3
Stack Filters
Motivated by the success of the median filter Wendt et al. [Wen86] developed a new filter class that shares two important properties of the median filter, namely, threshold decomposition and stacking properties. The former is a limited superposition property giving a new filter architecture and the latter is an ordering property.
3
CHAPTER 1" WEIGHTED ORDER STATISTIC AND STACK FILTERS
x
...00243140203200...
I
[ ~
Median filter
I
...00233312022200...
Addition
Threshold decomposition
" " " 
x4 . . . 0 0 0 1 0 0 1 0 0 0 0 0 0 0 . . . x3 . . . 0 0 0 1 1 0 1 0 0 0 1 0 0 0 . . . x2 . . . 0 0 1 1 1 0 1 0 1 0 1 1 0 0 . . . xl . . . 0 0 1 1 1 1 1 0 1 0 1 1 0 0 . . .
I Bina Me ianFilterl I Bin Me anFilterl I Binal, Me a Filtorl I Bina MedianFilterl
...00000000000000... 9.. 0 0 0 1 1 1 0 0 0 0 0 0 0 0 . . . ...00111101011100... ...00111111011100...
Figure 1.1: Illustration of stack filtering operation using threshold decomposition. The broad arrows show the overall filtering operation. The slender arrows show the same operation in the threshold decomposition architecture. The Boolean function used in the illustration is f(x) = XlX2 + XlX3 + x2x3, which corresponds to the threepoint median filter.
1.3.1
Definition
Consider an Mvalued vector x = IX1, X 2 , . . . , XN], where Xi ~ {0, 1 , . . . , M  1}. The threshold decomposition of x means the slicing of x into M  1 binary vectors x 1, x2, . . . , x M 1, obtained by the following thresholding rule: m
Xn =Tm(Xn)=
{1 0
ifXn > m, otherwise.
(1.1)
In other words an element xnk of the binary vector x k is equal to 1 whenever the element Xn of the input signal is greater than or equal to k but its value is zero otherwise. Thresholding does not lose any information about the signal; it only changes this information into simpler binary form. The original multivalued signal can be reconstructed from its thresholded binary vectors simply by adding them together: M1
Xn=
m
~~ Xn 9 m=l
What happens when we filter each binary slice x k separately by the median filter? First, the ordering operation for the binary signals reduces to additions, and second, the filtered binary slices form the threshold decomposition of the output of the median filter applied to the original signal, as shown in Fig. 1.1. If instead of the binary median function the filtering of binary slices is done by any binary function that possesses the property of commuting with threshold decomposition, we obtain a stack filter. Now we should find out which binary functions possess this property in order to formally define the stack filter. First some definitions are needed. Let x and y be binary vectors (signals) of fixed length. Define x j and the binary signals x 1, x2,..., x ux form a nonincreasing sequence. A Boolean function f ( . ) is called a positive Boolean function (PBF) if it can be written as a Boolean expression that contains only uncomplemented input variables. For a PBF f ( . ) it holds that f (x) > f (y)
if x >_y.
(1.3)
The property given by Eq. (1.3) is called the stacking property. In practice, the stacking property means that when the binary output signals are pried on top of each other, as in Fig. 1.1, there can be only zeros on top of a zero. So in the reconstruction phase the binary signals do not have to be added together but a simple binary search can be used to find the levels just before the transitions from 1 to 0 take place. Now we can define the stack filter. Definition 1.1. A stack filter Sf(. ) is defined by a positive Boolean function f ( . ) as follows: M1
Sf(x) = ~ f ( x m ) .
(1.4)
tn=l
Thus, filtering a vector x with a stack filter Sf(.) based on the PBF f ( . ) is equivalent to decomposing x to binary vectors x m, 1 < m < M  1, by thresholding, filtering each threshold level with the binary filter f ( . ) , and reconstructing the output vector as the sum Eq. (1.4). By using Eq. (1.4) stack filters are completely characterized by their operation on binary vectors. Thus, all of their properties can be deduced from their action on binary signals. A stack filter defined by a PBF is a WOS filter if and only if the PBF is linearly separable, that is, it can be represented in the form f(Xl'X2'''''XN)=
1 if ~.i=1 N lgiXi > T, 0
otherwise,
where x~ are binary variables and the weights w~ and threshold T are constants. It should be noted that different weights and thresholds can lead to the same PBF and thus the stack filter defined by a PBF is not unique. Definition 1.2. The dual fD (X) of a Boolean function f (x) is defined by fD (X) = f ( ~ ) , where ~ is the complement of x. A Boolean function f (x) is selfdual if and only if f (x) = fD (X). If in addition to being linearly separable the PBF is also selfdual, the stack filter defined by this PBF is a WM filter. Remark 1.1. For realvalued signals the stack filter (continuous amplitude) defined by a positive Boolean function f ( x l , x2,..., XN) with input vector x = [X1,X2,...,
CHAPTER 1:
WEIGHTED ORDER STATISTIC AND STACK FILTERS
5
XN ] can be defined as, for example, [Yli91],
Sf(x) = max {/3 ~ R: f (TI3(X1), T#(X2),..., TI3(XN)) = 1}, where the thresholding function is defined by Eq. (1.1), or it can be defined by the following connection with the PBF. The PBF K
f ( X l , X 2 , . . . , X N ) = ~. II Xj, i=1
(1.5)
jEPi
where the P~ are subsets of {1, 2 , . . . , N}, if and only if the stack filter SZ(. ) corresponding to f ( x l , x 2 , . . . ,XN) is S/(X) = max {min{Xj " j ~ P1},min{Xj " j e P 2 } , . . . , m i n { X j " j ~ PK}}.
(1.6)
The first formulation reflects the threshold expression of the original definition of the discrete stack filter and the second tells that the real domain stack filter corresponding to a PBF can be expressed by replacing "and" and "or" with "min" and "max", respectively. For example, the threepoint median filter over real variables X1, X2, and X3 (see also Fig. 1.1) is a stack filter defined by the PBF f ( x l , x 2 , x 3 ) = XlX2 + XlX3 + x2x3, that is, med {X1, X2 , X3 } = max { min {X1, X2 }, min {X1, X3 }, rain{X2, X3 } }. There is a close connection between stack filters and morphological filtering. For binary signals one can view morphological erosion as a stack filter defined by a single monomial. Thus, stack filters are essentially a union of erosions with flat structuring elements [Dou871.
1.3.2 Impulse and Step Responses Usually when nonlinear filters are considered, their impulse removal and edge preservation capabilities are mentioned as important properties; whether a filter has these properties can be found by studying its impulse and step responses. The impulse response of a filter is studied by filtering an input signal in which one sample is an impulse having the value  1 or I and the rest of the signal values are zero. When can this positive or negative impulse be the output of a stack filter in the case of a nontrivial Boolean function, that is, a function not identically equal to zero or one? This question can be answered easily by examining the subsets Pi of the maxmin representation of Eq. (1.6) of a stack filter. For a positive impulse to be the output, at least one of the minima must be equal to the value of the impulse; this can h a p p e n only if there is j such that IPjl = 1. If there is a negative impulse in the input, all of the minima must be equal to this value for it to be the maximum of them, that is, K
~Pj~e,
j=l
6
SA_RIPELTONEN e t al.
where ~ denotes the null set. For a WOS filter the impulses are completely removed if each weight wi < m i n
T, ~ . w j  T +
I
,
j1
and for the median filter the impulse response is zero. The class of stack filters includes filters with varying detail and edge preservation properties. With weighting, the detail preservation properties of median and order statistic filters can be improved, but only at the expense of lower noise suppression. Weights also can be chosen in such a way that certain structures, for example, lines, in the signals are preserved. A step signal has two constant areas of different values, between which there is an edge. Stack filters can only translate binary edges, and because of threshold decomposition edges, can be translated but not blurred.
1.3.3
Root Signals
Analysis of root signals is important for an understanding of the operation of stack filters. Root signals that pass through the filter unaltered give valuable information about it, so filters can be designed such that certain image patterns are root signals and thus are not disturbed by the filtering operation. To be able to filter the outmost input samples of a finite signal when parts of the filter window fall outside the input signal we need to append samples to the ends of the signal. A common appending strategy is to replicate the outmost input samples as many times as needed. The roots of a stack filter are all of the appended signals that are invariant under filtering. The median filter has the very nice property of converging to a root in a finite number of passes of the filter. This and the structure of the root signals is important for determining which filtering problems can be solved by the median filter. In a similar manner the root signals and convergence of stack filters has been analyzed to better understand which filtering problems are solvable by stack filters. We state here few simple roots of stack filters [Wen86] but encourage interested reader to consult a more profound presentation of the convergence and root signals of stack filters [Gab92]. We denote by 0 m and lm, a 0 and I repeated m times. A stack filter defined by a nontrivial PBF preserves all constant signals. An increasing signal is preserved by a stack filter defined by a nontrivial PBF with window width N = 2k + 1 if and only if the output of the PBF with input Ok lk+ 1 is equal to 1 and with input Ok§ 1lk is equal to 0. By interchanging 0 and 1 we obtain similar result for decreasing signals.
1.3.4
Output Distributions and Moments of Stack Filters
Since nonlinear filters are all of those filters that are not linear filters, there is a wide variety of different nonlinear filters and no common theory for such a het
7
CHAPTER 1" WEIGHTED ORDER STATISTIC AND STACK FILTERS
erogeneous filter class. However, within this filter class stack filters form a very specific subclass of signal s m o o t h e r s with the possibility to derive analytical results for their statistical properties. A basic statistical descriptor that can be u s e d to study the noise attenuation properties of stack filters is the o u t p u t distribution. It can also be u s e d for determining biasedness or u n b i a s e d n e s s of the estimator and in the optimization of the filter. We give here the o u t p u t distribution of a stack filter only for the case of indep e n d e n t and identically distributed (i.i.d.) input values that can be generalized to the case of nonidentically distributed samples [Yli91]. P r o p o s i t i o n 1.1. Let the input values X1, X2, . . . , XN, in the window of the stack filter
S f (. ) defined by a positive Boolean function f (. ) be i.i.d, random variables having a common distribution function ep(x). The distribution function of the output ~I (x) of the stack filter S f ( . ) is N
~l(x) = ~. At[ 1  ep(x) ]i~P(x)Ni,
(1.7)
i=0
where the numbers Ai are defined by Ai = I{x: f ( x ) = O, WH(X) = i}1,
(1.8)
with WH (X) denoting the number of 1 s in x, that is, its Hamming weight. Example 1.1. Let the input values X1, X2, and X3, in the window B of a stack filter Sf (.) defined by a positive Boolean function f (Xl, x2, x3) = XlX2 + XlX3 be i.i.d, random variables having a common distribution function r From Proposition 1.1 the output distribution function ~ ( x ) of the stack filter Sf(. ) is 9 (x) = ~3(x) + 3~2(x)[1  q~(x)] + ~(x)[1  ~(x)] 2 = r
+r
 ~3(x).
For the n u m b e r s Ai we have limits 0 _< Ai 1, at sample (Xx, X2,..., X N  1 ) is a plot of
15
CHAPTER 1" WEIGHTED ORDER STATISTIC AND STACK FILTERS
TN ( X I , X2 , . . . , X N  1 , X )
as a function of x. The SC [Tuk77] is defined as SeN(X) = N [TN(XI,X2,...,XNI,X)
 TNI (XI,X2,...,XN1)]
or when the estimator is a functional, that is, TN (X~, X2, 9 9 9 X N )  T (~N), as r ((1
1 ~)~NX
SeN(X) =
1 +~Ax)1 N
r (~NX) ,
where CN 1 is the empirical distribution function of (Xx, X2,..., XN 1). The third version is based on the ith jackknifed pseudovalue [Que56]: T~i = N T N ( X I , X 2
..... XN)  (N
1)TN1 ( X l , X 2 , . . . , X i  l , X i + l , . . . , X N ) .
Now the finitesample IF using jackknifing is defined as T;i
1.3.9
TN(XI,X2,...,XN).
Output Distributional Influence Function
For the finitesample influence functions either a real sample (X1, X2 . . . . . X N  1 ) or an artificial sample generated from the distribution r of the input samples is needed, and this sample itself, or the way it is derived from the distribution r affects the result. What we would like to have is a general method that uses the distribution function 9 of the input sample itself and not any artificial sample derived from r In the case where the output distribution of a filter can be expressed in a closed form as a function of the distribution functions of the input samples, the output distributional influence function (ODIF) has been introduced [Pe199a] for analyzing the robustness of the finite length filters. We assume here that the input samples are i.i.d, random variables. First we need a way to denote the output distribution function of a filter when a fraction E of the input samples has different distribution than the rest of the samples. We denote by ~IX(I_E)r (") the output distribution ~(. ) of the filter, where every occurrence of the common distribution function r of the input samples is replaced by ( 1  E)r + E.Gy,with Gy being a distribution function with mean y . The following definition was given for the ODIF for the distribution function [Pe199a]: Definition 1.6. Let the output distribution function of a filter be ~t'(. ), the common distribution function of the input samples be r and Gy(.) be a distribution function having mean y . Then the ODIF for the distribution function ~ ( . ) is ~ ( x , y ) = lim
xY(1e)O+eGy (X)  tY(X)
~'0+
for those x and y where this limit exists.
E
SARI PELTOIqEN et al.
16
ODIF
1.5
0.5
/ ....
'
'
' 4
.
.
.
.
.
 .~"
"~
.
.
>,a . . . .
,~,,,
Y
.
1.5
2
Figure 1.4: The ODIFs for the expectation of the CWM filter of length 7 having center weights 1 (long dashes), 3 (medium dashes), 5 (short dashes), and 7 (solid line) at the standard normal distribution and Gy = Ay. In the same way as for the distribution function in Definition 1.6 the ODIF was defined for the density function and moments [Pe199a]. The following proposition gives the ODIF for the distribution function of a stack filter by using the coefficients Ai [Pe199b]. Proposition 1.7. Let the distribution function of the input samples be ~(. ) and let Gy (.) be a distribution function having mean y . Then the ODIF for the distribution function of the stack filter of length N is given by
N1 f 2 ( x , y ) = ~'+ Ai ( 1  0 for ( N + 1)/2 < i < N  1. This means that we must make A i as large as possible for 1 < i < (N  1)/2 and as small as possible for (N + 1)/2 < i < N  1. This obviously happens if we choose A, = (N) for 1 < i < ( N  1)/2 and Ai = 0 for (N + 1)/2 < i < N  1. This choice gives the median filter! In a more meaningful situation we have additional constraints on the coefficients Ai that arise, for example, from requirements that the filter have a certain degree of robustness and also be able to preserve details of a prescribed type. The constraints that give detail preservation can be given, for example, by fixing predetermined values of the defining Boolean function. The coefficients Ai of a stack filter Sf(. ) of window size N and the rank selection vector r = Jr1, r 2 , . . . , rN ] satisfy [Kuo94] rJ=
ANj (N)
ANj+1 (jNI)
,
j=I,2,...,N.
(1.22)
The rank selection probabilities give an intuitively appealing way of constraining a stack filter. For instance, a certain amount of robustness is guaranteed if
SARI PELTONEN et
18
al.
we require that rl = r2 . . . . . rk = 0 and rz = rl+ 1 . . . . . rN = 0. This will give a stack filter that is "trimmed" in the same way as an Lfilter with the coefficients corresponding to a number of the largest and smallest coefficients equal to zero. Constraints on the rank selection probabilities translate immediately into constraints on Ai because of the above relation. From the breakdown points we achieve new constraints for the coefficients A i. Consider, for example, the design of a stack filter for which e~ > a for some a ~ [0, 1]. Then
ANI=AN2=...=ANtaNI=O
and
Aj= (~.),
j=l,2,...,laN].
(1.23)
Similarly, we can have breakdown probability constraints. For example, if we wish to have ,9"N(p, q) < a for some a ~ [0, 1] then N1
y
Ai (pN'(X  p ) '  q'(1  q)N,) < a  1.
i=0
The relations given above make it possible to optimize stack filters in the mean square sense without performing a full search over all stack filters, which otherwise would be impossible, except for small window sizes, because of the very large n u m b e r of stack filters. For instance, for the window size N the number of different stack filters is greater than 2 2N/N [Aga95, Shm95]. The optimization consists of finding a solution of the integer linear programmmg task N1
minimize
~. AiU(~, 2,N, i),
(1.24)
i=0
under the constraints for A~ and then determining a stack filter with the above coefficients A~ if it exists. It must be emphasized that usually there is no guarantee that such a stack filter exists. However, once we have the target coefficients Ai the search for the optimal stack filter is simpler. We also can take a stack filter that has coefficients Ai close to the solution of the optimization problem Eq. (1.24) and then check to see if its filtering behavior is satisfactory. In the above, we considered the rank selection probabilities and used them to constrain the filter in the statistical sense. In many image processing problems it is not enough to know the "average" behavior of the filter; we need to be sure that it will handle certain signal segments in a prescribed way. This can be achieved using socalled structural constraints, the goal of which is to preserve some desired signal details, for example, pulses in 1D signals, or lines in images, and to remove undesired signal patterns. The structural constraints consist of a list of different structures to be preserved, deleted, or modified. Since stack filters obey the threshold decomposition, the structural constraints need to be considered only in the context of binary signals. That is, they can be specified by a set of binary vectors and their outputs. The binary vectors are divided into two subsets, type 1 constraints and type 0 constraints [Yin95].
CHAPTER 1: WEIGHTEDORDER STATISTIC AND STACK FILTERS
19
A b i n a r y v e c t o r t h a t is s p e c i f i e d b y t h e s t r u c t u r a l c o n s t r a i n t s is called a t y p e 1 c o n s t r a i n t if its o u t p u t is 1; o t h e r w i s e , it is called a t y p e 0 c o n s t r a i n t . D e n o t e the set of all t y p e 1 c o n s t r a i n t s b y Fx = { X x , X 2 , . . . ,Xp} a n d t h e set of all t y p e 0 c o n s t r a i n t s b y F0 = {Yl, Y 2 , . . . , Yq }. S t r u c t u r a l c o n s t r a i n t s i n d u c e t w o n e w c o n s t r a i n t s for the coefficients Ai: Let 1) the n u m b e r of v e c t o r s x ~ Fx w i t h "WH (X) = i b e y~ for all 1 < i < N  1. T h e n
l 0 and is the replication operator defined as W i . x i = xi, x i , . . . , x i . J
Wi times
Weighted median smoothers were introduced in the signal processing literature by Brownrigg in 1984 and have since received considerable attention [Bro84, Ko91, Yin96]. The WM smoothing operation can be schematically described as in Fig. 2.2. Weighted medians admitting only positive weights are lowpass filters by nature and consequently these signal processing structures are here refered to as "smoothers."
32
GONZALO R. ARCE AND JOSE L. PAREDES
The computation of weighted median smoothers is simple. Consider the WM smoother of window size 5 defined by the symmetric weight vector W = [ 1, 2, 3, 2, 1]. For the observation x(n) = [ 12, 6, 4, 1,9], the weighted median smoother output is found as y ( n ) = MEDIAN[If,12, 21,6, 3 . 4 , 2~1, 11,9]
= MEDIAN[ 12, 6, 6, 4, 4, 4, 1, 1, 9] = MEDIAN[I, 1,4, 4, 4, 6, 6, 9, 12] =4.
(2.9)
The large weighting on the center input sample results in this sample being taken as the output. As a comparison, the standard median output for the given input is y (n) = 6. More on the computation of WM smoothers will be described later.
The Center Weighted Median Smoother The weighting mechanism of WM smoothers allows for great flexibility in emphasizing or deemphasizing specific input samples. In most applications, not all samples are equally important. Due to the symmetric nature of the observation window, the sample most correlated with the desired estimate is, in general, the center observation sample. This observation leads to the center weighted median (CWM) smoother, which is a relatively simple subset of WM smoother that has proven useful in many applications [Ko91]. The CWM smoother is realized by allowing only the center observation sample to be weighted. Thus, the output of the CWM smoother is given by y ( n ) = M E D I A N [ x l , . . . , X c _ x , WcOxc, Xc+l . . . . , X N ] ,
(2.10)
where We is an odd positive integer and c = (N + 1)/2 = N 1 + 1 is the index of the center sample. When We = 1, the operator is a median smoother, and for Wc > N, the CWM reduces to an identity operation. The effect of varying the center sample weight is perhaps best seen by way of an example. Consider a segment of recorded speech. The voiced waveform "a" is shown at the top of Fig. 2.3. This speech signal is taken as the input of a CWM smoother of size 9. The outputs of the CWM, as the weight parameter We = 2w + 1 for w = 0 , . . . , 3, are shown in the figure. Clearly, as Wc is increased less smoothing Occurs.
The CWM smoother has an intuitive interpretation. It turns out that the output of a CWM smoother is equivalent to computing y ( n ) = MEDIAN [X(k),Xc,X(Nk+I) ] ,
(2.11)
where k = ( N + 2  W c ) / 2 f o r 1 = (0.1, 0.2, 0.3,0.2, 0.1>. The output for this filter operating on the observation set [Xx, x2, x3, x4, x5 ] = [  2, 2,  1, 3, 6 ] is found as follows: Summing the absolute weights gives the threshold To = 1/2 ~'.~1 ]Wil = 0.45. The "signed" observation samples, sorted observation samples, their corresponding weights, and the partial sum of weights (from each ordered sample to the maximum) are observation samples corresponding weights
2, 0.1,
2, 0.2,
1, 0.3,
3, 0.2,
6 0.1
sorted signed observation samples corresponding absolute weights partial weight sums
3, 0.2, 0.9,
2, 0.1, 0.7,
1, 0.3, 0.6,
2, 0.2, 0.3,
6 0.1 0.1
Thus, the output is  1 since when starting from the right (maximum sample) and summing the weights, the threshold To = 0.45 is not reached until the weight associated with  1 is added. The underlined sum value indicates that this is the
CHAPTER 2"
37
IMAGE ENHANCEMENT AND ANALYSIS
first sum that meets or exceeds the threshold. To warrant high or bandpass characteristics, the WM filter output would be modified so as to compute the average between  1 and  2, leading to  1.5 as the output value. It should be noted that as a result of the negative weights, the computation of the weighted median filter is not shift invariant. Consider the previous example and add a shift of 2 on the samples of x such that x i = xi + 2. The weighted median filtering of x' = [4,  4 , 11, 3, 15 ] with the weight vector W = ( 1,  2, 3,  2, 1 ) leads to the output y ' (n) = 4, which does not equal the previous output in Eq. (2.17) of 6 plus the appropriate shift. I
Permutation Weighted Median Filters The principle behind the CWM smoother lies in its ability to emphasize or deemphasize the center sample of the window by adjusting the center weight while keeping the weight values of all other samples at unity. In essence, the value given to the center weight indicates the "reliability" of the center sample. If that sample does not contain an impulse (high reliability), it would be desirable to make the center weight large such that no smoothing takes place (identity filter). On the other hand, if an impulse were present in the center of the window (low reliability), no emphasis should be given to the center sample (impulse), and the center weight should be given the smallest possible weight, that is, Wc = 1, reducing the CWM smoother structure to a simple median. Notably, this adaptation of the center weight can be easily achieved by considering the center sample's rank among all pixels in the window [Arc95, Har94]. More precisely, denoting the rank of the center sample of the window at a given location as Rc (n), then the simplest permutation WM smoother is defined by the following modification of the CWM smoothing operation:
We (n)
~"N 1. 1
for TL < Rc (n) .
1
1
1
1
1
1
(2.51)
Due to the weight coefficients in Eq. (2.51), for each position of the moving window the output is proportional to the difference between the center pixel and the smallest pixel around it. Thus, the filter output takes relatively large values for prominent edges in an image, but small values in regions that are fairly smooth, being zero only in regions that have a constant gray level. Although this filter can effectively extract the edges contained in an image, the effect that this filtering operation has on negativeslope edges is different from that obtained for positiveslope edges. 2 Since the filter output is proportional to the difference between the center pixel and the smallest pixel around the center, for negativeslope edges the center pixel takes small values, producing small values at the filter output. Moreover, the filter output is zero if the center pixel and smallest pixel around it have the same values. This implies that negativeslope edges are not extracted in the same way as positiveslope edges. To overcome this limitation we must modify the basic image sharpening structure shown in Fig. 2.12 such that 2A change from one gray level to a lower gray level is referred to as a negativeslope edge, whereas a change to a higher gray level is referred to as a positiveslope edge.
CHAPTER 2" IMAGE ENHANCEMENT AND ANALYSIS
55
Highpass WM filter ~'2 . Highpass [ " I Prefiltering I ~ WM filter
\
Figure 2.13: Image sharpening based on the weighted median filter. positiveslope edges as well as negativeslope edges are highlighted in the same proportion. A simple way to accomplish that is as follows: 1. Extract the positiveslope edges by filtering the original image with the filter mask described above. 2. Extract the negativeslope edges by first preprocessing the original image such that the negativeslope edges become positive slopes, and then filter the preprocessed image with the filter described above. 3. Combine appropriately the original image and the filtered versions of the original image and the preprocessed image to form the sharpened image. Thus, both positive and negativeslope edges are equally highlighted. This procedure is illustrated in Fig. 2.13, in which the top branch extracts the positiveslope edges and the middle branch extracts the negative ones. To understand the effects of edge sharpening, we plot a row of a test image in Fig. 2.14, together with the row from the sharpened image when only the positiveslope (2.14a), and negativeslope (2.14b), edges are highlighted and when both are jointly highlighted (2.14c). The ~1 and ~2 in Fig. 2.13, are tuning parameters that control the amount of sharpness desired in the positive and negativeslope directions, respectively. Their values are generally selected to be equal. The output of the prefiltering operation is defined as x ( m , n)' = M  x ( m , n), (2.52) with M equal to the maximum pixel value of the original image. This prefiltering operation can be thought of as a flipping and a shifting operation of the values of the original image such that the negativeslope edges are converted to positiveslope edges. Since the original and the prefiltered images are filtered by the same WM filter, the positive and negativeslope edges are sharpened in the same way. In Fig. 2.15, the performance of the WM filter image sharpening is compared with that of traditional image sharpening based on linear FIR filters. For the linear sharpener, the scheme shown in Fig. 2.12 was used and the parameter A was set
56
GONZALOR. ARCF AND JOSE L. PARF,DES
.. .:
..
(a)
(b)
(c)
Figure 2.14: Original row of a test image (solid lines) and row sharpened (dotted lines) with (a) only positiveslope edges, (b) only negativeslope edges, and (c) both positive and negativeslope edges.
Figure 2.15: (a) Original image, sharpened with (b) the FIRsharpener and (c) the WM sharpener. to 1. For the WM sharpener, the scheme of Fig. 2.13 was used with ~ 1  ~ 2   2. The filter mask given by Eq. (2.51) was used in median image sharpening, whereas the filter mask for the linear image sharpening is 1/3 W, where W is given by Eq. (2.51). Sharpening with WM filters does not introduce as much noise amplification as sharpeners equipped with FIR filters do.
2.5.2
Sharpening with Permutation WM filters
Linear highpass filters are inadequate in unsharp masking whenever background noise is present. Although WM highpass filters ameliorate the problem, the goal is to improve on their performance by allowing the WM filter weights to take on rankdependent values.
CHAPTER 2:
57
IMAGE E N H A N C E M E N T AND ANALYSIS
The unsharp WM filter structure shown in Fig. 2.13 is used with the exception that permutation WM filters are now used to synthesize the highpassfilter operation. The weight mask for the permutation WM highpass filter is
I W 
WI(R1,Rr W4 (R4,Rc ) W7(u7,Uc)
W2(R2,Rc) Wc (Re) W8(R8,Uc)
W3(R3,Rc) I W6( R6 ,Re ) , W9(u9,Uc)
(2.53)
where Wi(Ri,Rc) depends only on the rank of the ith sample and the rank of the center sample. Wi(ui,Rc) =   1 , for i = 1 , . . . , 9 , i * 5, Re = 1 , . . . , 9 , with the following exceptions: The center weight is given the value according to 8
Wc(Rc)
=

for Rc = 2, 3 , . . . , 8, 1 otherwise.
(2.54)
That is, the value of the center weight is 8 if the center sample is not the smallest or largest in the observation window. If it happens to be the smallest or largest, its reliability is low, and the weighting strategy must be altered such that the center weight is set to  1 and the weight of 8 is given to the sample closest in rank to the center sample, leading to Wg(8)(8'9) =
8 1
We~2~(2,1) =
1
for Xc = x ( 9 ) , otherwise; for Xc = X(x), otherwise.
(2.55)
Here e(i) refers to the location of the ith smallest sample in the observation window and We~i~ refers to its weight. This weighting strategy can be extended to the case where the L smallest and L largest samples in the window are considered unreliable, and the weighting strategy applied in Eq. (2.55) now applies to the weights We(L+I)(L+X,L ) and
We(N_L)(NL,NL +I ) " Figure 2.16 illustrates the image sharpening performance when permutation WM filters are used. A Saturn image with added Gaussian background noise is shown in Fig. 2.16a. The other images show this image sharpened with (b) a LowerUpperMiddle (LUM) sharpener [Har93], (c) a linear FIR filter sharpener, (d) the WM filter sharpener, and the permutation WM filter sharpener with (e) L = 1 and (f) L = 2. The A parameters were given a value of 1.5 for all weighted median type sharpeners a value of I for the linear sharpener. The linear sharpener introduces background noise amplification. The LUM sharpener does not amplify the background noise; however, it introduces severe edge distortion artifacts. The WM filter sharpener ameliorates the noise amplification and does not introduce edge artifacts. The permutation WM filter sharpeners perform best, with higher robustness attributes as L increases.
58
GONZALOR. ARCE AND JOSE L. PAREDES
Figure 2.16: (a) Image with background noise sharpened with (b) the LUM sharpener, (c) the FIR sharpener, (d) the WM sharpener, and the permutation WM sharpener with (e) L = 1 and (f) L = 2.
2.6 Optimal Frequency Selection WM Filtering We now consider the design of a robust bandpass recursive WM filter using the LMA adaptive optimization algorithm. The performance of the optimal recursive WM filter is compared with the performances of a linear FIR filter, a linear IIR filter, and a nonrecursive WM filter all designed for the same task. Moreover, to show the noise attenuation capability of the recursive WM filter and compare it with those of the other filters, we used an impulsenoisecorrupted test signal. Examples are shown in onedimensional signals for illustration purposes but the extension to twodimensional signals is straightforward.
CHAPTER 2:
IMAGE ENHANCEMENT AND ANALYSIS
59
The application at hand is the design of a 62tap bandpass RWM filter with passband 0.075 < r < 0.125 (normalized Nyquist frequency = 1). We used white Gaussian noise with zero mean and variance equal to one as input training signals. The desired signal was provided by the output of a large FIR filter (122tap linear FIR filter) designed by MATLAB'S Mfile fir function. The 31 feedback filter coefficients were initialized to small random numbers (on the order of 103). The feedforward filter coefficients were initialized to the values output by MATLAB'S fir1 with 31 taps and the same passband of interest. A variable step size/~(n) was used in both adaptive optimizations, where the step size/~(n) changes according to/./0e n/100 with/~0 = 10 2. A signal that spanned the range of frequencies of interest was used as a test signal. Figure 2.17a depicts a linear sweptfrequency signal spanning instantaneous frequencies from 0 to 400 Hz, with a sampling rate of 2 kHz. Figure 2.17b shows the chirp signal filtered by the 122tap linear FIR filter used to produce the desired signal during the training stage. Figure 2.17c shows the output of a 62tap linear FIR filter used for comparison purposes. The adaptive optimization algorithm described in Section 2.2 was used to optimize a 62tap nonrecursive WM filter admitting negative weights; the filtered signal attained is shown in Fig. 2.17d. Note that the nonrecursive WM filter tracks the frequencies of interest but fails to attenuate completely the frequencies out of the desired passband. MATLAB'S yulewalk function was used to design a 62tap linear IIR filter with passband 0.075 < co < 0.125; Fig. 2.17e depicts its output. Finally, Fig. 2.17f shows the output of the optimal recursive WM filter determined by the LMA training algorithm described in Sec. 2.2.2. Note that the frequency components of the test signal that are not in the passband are attenuated completely. Moreover, the RWM filter generalizes very well on signals that were not used during the training stage. Comparing the different filtered signals in Fig. 2.17, we see that the recursive filtering operation performs much better than its nonrecursive counterpart having the same number of coefficients. Likewise, to achieve a specified level of performance, a recursive WM filter generally requires considerably fewer filter coefficients than the corresponding nonrecursive WM filter. To test the robustness of the different filters, we next contaminated the test signal with additive astable noise (Fig. 2.18a); the impulse noise was generated using the parameter a set to 1.4. (Fig. 2.18a is truncated so that the same scale is used in all plots.) Figures 2.18b and 2.18d show the filter outputs of the linear FIR and IIR filters, respectively; both outputs are severely affected by the noise. On the other hand, the nonrecursive and recursive WM filters' outputs, Fig. 2.18c and 2.18e, remain practically unaltered. Figure 2.18 clearly depicts the robust characteristics of medianbased filters. To better evaluate the frequency response of the various filters, we performed a frequency domain analysis. Due to the nonlinearity inherent in the median operation, traditional linear tools, such as transferfunctionbased analysis, cannot be applied. However, if the nonlinear filters are treated as a singleinput, single
60
GONZALO R. ARCE AND JOSE L. PAREDES
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2.17: Bandpass filter design: (a) input test signal, (b) desired signal, (c) linear FIR filter output, (d) nonrecursive WM filter output, (e) linear IIR filter output, and (f) RWM filter output. (Reproduced with permission from [Arc00]. 9 2000 IEEE.) output system, the magnitude of the frequency response can be experimentally obtained as follows: A singletone sinusoidal signal s i n ( 2 r r f t ) was given as the input to each filter, with f spanning the complete range of possible frequencies. A sufficiently large number of frequencies spanning the interval [0, 1] was chosen. For each frequency value, the mean power of each filter's output was computed. Figure 2.19a shows a plot of the normalized mean power versus frequency attained by the different filters. Upon closer examination of Fig. 2.19a, it can be seen that the recursive WM filter yields the flattest response in the passband of interest. A similar conclusion can be drawn from the time domain plots shown in Fig. 2.17.
CHAPTER 2"
IMAGE ENHANCEMENT AND ANALYSIS
61
(a)
(b)
(c)
(d)
(e)
Figure 2.18: Performance of the bandpass filter in noise: (a) chirp test signal in stable noise, (b) linear FIR filter output, (c) nonrecursive WM filter output, (d) linear IIR filter output, and (e) RWM filter output. (Reproduced with permission from [ArcO0]. 9 2000 IEEE.) To see the effects that impulse noise has over the magnitude of the frequency response, we input to each filter a contaminated sinusoidal signal s i n ( 2 r r f t ) + 17, where 17 is e~stable noise with parameter a = 1.4. Following the same procedure described above, the mean power versus frequency diagram was obtained and is shown in Fig. 2.19b. As expected, the magnitudes of the frequency responses for the linear filters are highly distorted, whereas those for the medianbased filters do not change significantly with noise.
62
GONZALO R. ARCE AND JOSE L. PAREDES
" ~ .....\
10~ I
'
F i\
,..\
'/.'
I
/~ ,,
/.'
/~
/
~ ' :,, ,,/,,,,,
\ \
/
 ""
II
\~ / \
/"
.,
~.',
.\
1
"It
,
. . . . .~.....~.
,..,,
, i
i
./1t I I
,,
,
;tq r,?C,c, ~ / . . I ~1 I / I
/
I i
~
Ii
102
103
104
0.05
0.1
0.15
0.2
0.25
105
0
= 0.05
= 0.1
(a)
0.15
= 0.2
I
0.25
(b)
Figure 2.19: Frequency response to (a) a noiseless and (b) a noisy sinusoidal signal: (solid lines) RWM, (dotteddashed lines) nonrecursive WM filter, (thin dashes) linear FIR filter, and (thick dashes) linear IIR filter. (Reproduced with permission from [Arc00]. 9 2000 IEEE.)
Or,gal I.,,hpasslIappl,edII Edge Image "
Filter
" as threshold " Thinning
___Edge Map
Figure 2.20: The process of edge detection
2.7
Edge Detection
Edge detection is an important tool in image analysis and is necessary for applications of computer vision in which objects need to be recognized by their outlines. An edgedetection algorithm should show the locations of major edges in the image while ignoring false edges caused by noise. The most common approach used for edge detection is illustrated in Fig. 2.20. A highpass filter is applied to the image to obtain the amount of change present in the image at every pixel. The output of the filter is thresholded to determine those pixels that have a rate of change high enough to be considered as lying on an edge; that is, all pixels with filter output greater than some value T are taken as edge pixels. The value of T can be adjusted to give the best visual results. High thresholds lose some of the real edges, while low values may result in many false edges; thus, a tradeoff is needed to get the best results. Other techniques such as edge thinning are often applied to further pinpoint the location of the edges in an image. The most common linear filter used for the initial highpass filtering is the Sobel operator, which uses the following 3 x3 masks:
C H A P T E R 2:
63
IMAGE E N H A N C E M E N T A N D A N A L Y S I S
/
1 0 1
2 0 2
1 0 / 1
1 and
0 0 0
/2 1
1 2 / . 1
These two masks are convolved with the image separately to measure the strength of the horizontal and vertical edges, respectively, present at each pixel. Thus, if the amount to which a horizontal edge is present at the pixel in the ith row and j t h column is represented as Ehj, and if the vertical edge indicator is E~,j, then the values are E z,j h . =
Xil,j1  2Xil,j  Xil,j+l + Xi+l,j1 + 2Xi+l,j + Xi+l,j+l, E VL,j. =  X i  l , j  1  2 x i , j  1  Xi+l,j1 + Xil,j+l + 2xi,j+l + Xi+l,j+l,
xi,j is the pixel located at the ith row and j t h column. The two strengths are combined to find the total amount to which any edge exists at a pixel: Z,J F t~ = ((Ehj) 2 + (E~,j)2) 1/2. This value is then compared to the threshold T to determine the existence of an edge. In place of using linear highpass filters, WM filters with the weights from the Sobel masks can be used. The Sobel linear highpass filters take a weighted difference between the pixels on either side of xi,j. On the other hand, if the same weights are used in a weighted median filter, the value returned is the difference between the lowestvalued pixels on either side of xi,j. If the pixel values are then flipped about some middle value, the difference between the highestvalued pixels on either side can also be obtained. The flipping can be achieved by finding some maximum pixel value M and using xi, j = M  xi,j as the "flipped" value of xi,j, thus causing the highest values to become the lowest. The lower of the two differences across the pixel can then be used as the indicator of the presence of an edge. If a true edge is present, then both differences should be high in magnitude, while if noise causes one of the differences to be too high, the other difference is not necessarily affected. Thus, the horizontal and vertical edge indicators are where
!
[
MEDIAN E h . = rnJ.n z,a
l~'Xil,j1, l~'Xi+l,j1,
2r 2~'Xi+l,j,
[ l'Xil,j1,, MEDIAN l~,Xi+l,j_l,
MEDIAN EV.= min MEDIAN
[
l~'Xil,j+l l~'Xi+l,j+l , , lOXi+l,j+l
'
l*Xi_l,j+i,]
[email protected]_l,j, , 24,xi+1,j,
lr 2~'Xi,j1, l~'Xi+l,j1,
14'Xil,j+l, 2r 14'Xi+l,j+l
I~'x~_I,j_ 1, 2~'Xi,j_ 1 ,  l#X~+l,j1,
Ir 1, 2~'Xi,j+ 1, l#x~+1,j+l
!
]
!
]
'~ ,
]
/
_
,
64
GONZALO R. ARCE AND JOSE L. PAREDES
h,v
and the strength of the horizontal and vertical edges E(i,j ) is determined in the same way as in the linear case:
(Ehj ) 2 + (EVj)2 , ) 1/2 .
Eh,V ~,j=(
Horizontal and vertical indicators are not sufficient to register diagonal edges, so the following two masks must also be used as weights for the WM filter. 2 /1 0
1 0 1
0 1 / 2
0 and
1 0 1
/1 2
2 1 / . 0
Thus the strengths of the two types of diagonal edges in an image are Eid,s for those going from the bottom left to the top right (left mask) and Eid2 for those from top left to bottom right (right mask). The values are given by
MEDIAN
MEDIAN
[
MEDIAN
Eid2 = min MEDIAN
2'@Xil,j1, l'@xi,j1, 1 ~xi+ 1,j,  2 ~ ' X i _ 1 , j  1' y
l'@Xil,j, ] '~ l'@xi,j+l, , 2 ~xi+ 1,j+ 1  1 ~'x~_ 1 , j , l l~xi,j+ 1 , I
 l ~xi,j_ 1 , 1 II~X~+I,j,
2"X~+1,j+1
l'@Xil,j, l'@xi,j1,
[email protected]+l,j1,
2'~Xil,j+l, l'@xi,j+l, l'@Xi+l,j
lOxi,j1, 2'~X~+I,j_ 1,
l~xi,j+l, l'~Xi+l, j
]
j
] ,
y
A diagonal edge strength is determined in the same way as the horizontal and vertical edge strengths above: Edl,d2
i,j
dl 2
d2 2 1/2
= ((Ei,j) + (Ei,j) )
9
The indicator of all edges in any direction is the maximum of the two strengths Eh,V dl,d2.
i,j and El, j
EtOtal
i,j
=max,
(Eh,V dl,d2 t,j,Ei, j ).
As in the linear case, this value is compared to the threshold T to determine whether a pixel lies on an edge. Figure 2.21 shows the results of calculating t,J Ft~ for an image. The results of the median edge detection are similar to the results of using the Sobel linear operator. Other approaches for edge detector based on the median filter can be found elsewhere [Bov86, Pit86].
CHAPTER 2:
IMAGE ENHANCEMENT AND ANALYSIS
65
Figure 2.21" (a) Original image, and edge detection using (b) the linear method and (c) the median method.
2.8 Conclusion The principles behind WM smoothers and WM filters have been presented in this chapter, as well as some of the applications of these nonlinear methods to image processing. It should be apparent to the reader that many similarities exist between linear and median filters. As illustrated here, there are several applications in image enhancement where WM filters provide significant advantages over traditional methods using linear filters. The methods presented here, and other image enhancement methods that can be easily developed using WM filters, are computationally simple and provide significant advantages. Consequently, they can be used in emerging consumer electronic products, PC and internet imaging tools, and medical and biomedical imaging systems, as well as in military applications.
Acknowledgments This research has been supported through collaborative participation in the Advanced Telecommunications/Information Distribution Research Program (ATIRP) Consortium sponsored by the U.S. Army Research Laboratory under the Federated Laboratory Program, Cooperative Agreement DAAL019620002, and by the NSF under grants MIP9530923 and CDA9703088.
References [Arc86] G.R. Arce. Statistical threshold decomposition for recursive and nonrecursive median filters. IEEE Trans. Inf. Theory IT32(2), 243253 (March 1986). [Arc88] G.R. Arce and N. C. Gallagher. Stochastic analysis of the recursive median filter process. IEEE Trans. Inf. Theory IT34(4), 669679 (July 1988). [Arc95] G.R. Arce, T. A. Hall, and K. E. Barner. Permutation weighted order statistic filters. IEEE Trans. Image Process. 4, 10701083 (August 1995).
66
GONZALOR. ARCEAND JOSEL. PAREDES
[Arc98] G. R. Arce. A general weighted median filter structure admitting negative weights. IEEE Trans. Signal Process. SP46(12), 31953205 (December 1998). [Arc00] G. R. Arce and J. L. Paredes. Recursive weighted median filters admitting negative weights and their optimization. IEEE Trans. Signal Process. 48(3), 768799 (2000). [Arn92] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja. A First Course in Order Statistics. Wiley, New York (1992). [Ast97] J. Astola and P. Kuosmanen. Fundamentals of Nonlinear Digital Filtering. CRC Press, Boca Raton, FL (1997). [Bin92] Z. Bing and A. N. Venetsanopoulos. Comparative study of several nonlinear image interpolation schemes, Proc. SPIE, pp. 2129 (November 1992). [Bov83] A. C. Bovik, T. S. Huang, and D. C. Munson, Jr. A generalization of median filtering using linear combinations of order statistics. IEEE Trans. Acoust. Speech, Signal Process. ASSP31(6), 13421350 (December 1983). [Bov86] A. C. Bovik and D. C. Munson. Edge detection using median comparisons. Comput. Vision Graph. Image Process. 33(3), 377389 (March 1986). [Bro84] D. R. K. Brownrigg. The weighted median filter. Commun. Assoc. Comput. Machin. 27(8), 807818 (August 1984). [Dav82] H. A. David. Order Statistics, Wiley Interscience, New York (1982). [Edg87] F.Y. Edgeworth. A new method of reducing observations relating to several quantities. Philos. Mag. (Fifth Series), 24, 222223 (1887). [Har93] R. C. Hardie and C. G. Boncelet, Jr. LUM filters: A class rank order based filter for smoothing and sharpening. IEEE Trans. Signal Process. 41(3), 10611076 (March 1993). [Har94] R. C. Hardie and K. E. Barner. Rank conditioned rank selection filters for signal restoration. IEEE Trans. Image Process. 3(2), 192206 (March 1994). [Jai89] A. K. Jain. Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ (1989). [Ko91] S.J. Ko and Y. H. Lee. Center weighted median filters and their applications to image enhancement. IEEE Trans. Circ. Syst. 38(9), 984993 (September 1991). [Lee85] Y. H. Lee and S. A. Kassam. Generalized median filtering and related nonlinear filtering techniques. IEEE Trans. Acoust. Speech Signal Process., ASSP33(3), 672683 (June 1985). [Leh83] E. L. Lehmann. Theory of Point Estimation, Wiley, New York (1983).
CHAPTER 2:
IMAGE ENHANCEMENT AND ANALYSIS
67
[Mit00] S.K. Mitra. Digital Signal Processing: A ComputerBased Approach, 2nd ed. McGrawHill, Burr Ridge, IL (2000). [Par99] J. L. Paredes and G. R. Arce. Stack filters, stack smoothers, and mirrored threshold decomposition. IEEE Trans. Signal Process. 47(10), 27572767 (October 1999). [Pit86] I. Pitas and A. N. Venetsanopoulos. Nonlinear order statistic filters for image filtering and edge detection. Signal Process. 10(4), 395413 (April 1986). [Pit90] I. Pitas and A. N. Venetsanopoulos. Nonlinear Digital Filters: Principles and Applications. Kluwer, Boston (1990). [Que95] R. Queiroz, D. Florencio, and R. Schafer. Nonexpansive pyramid for image coding using a nonlinear filterbank. IEEE Trans. Image Process. 7(2), 246252 (February 1995). [Shy89] J. Shynk. Adaptive IIR filtering. IEEE ASSP Mag. 6(2), 421 (April 1989). [Tuk74] J. W. Tukey. Nonlinear (nonsuperimposable) methods for smoothing data. In Conf. Rec. Eascon, p. 673 (1974). [Yin96] L. Yin, R. Yang, M. Gabbouj, and Y. Neuvo. Weighted median filters: a tutorial. IEEE Trans. Circ. Syst. II, 43(3), 157192 (March 1996).
This Page Intentionally Left Blank
SpatialRank Order Selection Filters KENNETH E. BARNER Department of Electrical and Computer Engineering University of Delaware Newark, Delaware RUSSELL C. HARDIE Department of Electrical and Computer Engineering University of Dayton Dayton, Ohio
3.1
Introduction
Many image processing applications demand the use of nonlinear methods. Image nonstationarities, in the form of edges, and commonly occurring heavytailed noise result in image statistics that are decidedly nonGaussian. Linear methods often perform poorly on nonGaussian signals and tend to excessively smooth visually important image cues, such as edges and fine detail. Through the consideration of appropriate statistical signal and interference models, more effective image processing algorithms can be developed. Indeed, an analysis based on maximum likelihood estimation carried out in Sec. 3.2, indicates that rank order based processing of signals such as images is more appropriate than linear processing. Strict rank order methods, however, are spatially blind and cannot exploit the rich spatial correlations generally present in images. 69
70
KENNETH E. BARNER AND RUSSELL C. HARDIE
This chapter explores the joint use of spatial and rank (SR) ordering information in a selection filter framework and applies the methods developed to several common image processing problems. Each of the marginal orderings of observed samples yields information that can be used in the design of filtering algorithms: spatial ordering can be used to exploit correlations between neighboring samples, while rank order can be used to isolate outliers and ensure robust behavior. By operating jointly on the SR orderings, sophisticated algorithms can be developed that exploit spatial correlations while producing robust outputs that appropriately process abrupt signal transitions (edges) and are immune to sample outliers. Numerous filtering algorithms have been developed that exploit, in some fashion, spatial and rank order information in the filtering operation. A large class of such filters can be categorized as selection type, in that their output is always one of the samples from the local observation window. Restricting a filter to be selection type is rarely limiting and, as shown in Sec. 3.2, often holds advantages in image processing applications. Thus, we focus on developing the broad class of SR selection filters that operates on the SR ordering information of observed samples. To illustrate the advantages of SR selection filters over traditional linear schemes, consider the smoothing of a noisy image. Figure 3.1 shows the results of processing a noisy image with a weighted sum filter that operates strictly on spatial order, along with the results of a selection filter that operates jointly on SR ordering information. These results indicate that weighted sum filters tend to smooth edges and obliterate fine detail. In contrast, the selection filter, by operating jointly on the SR ordering information, is able to suppress the noise while simultaneously preserving edges and fine detail. The remainder of this chapter theoretically motivates SR selection filters, develops several filter class subsets and extensions that utilize partial, full, or extended SR ordering information, and applies the filters developed to several image processing problems. Section 3.2 begins with a theoretical discussion of maximum likelihood (ML) estimation that motivates the use of rank order in the processing of signals with heavytailed distributions. The ML development leads naturally to several rank order selection filters, including the median filter, which are then extended to the general class of selection filters. Additionally, a general framework for relating the spatial and rank orderings of samples is introduced in the section. The broad class of SR selection filters is discussed in Sec. 3.3, beginning with permutation filters, which utilize the full SR ordering information. The factorial growth (with window size) in the number of SR ordering limits the size of permutation filter window that can be utilized in practice. To efficiently utilize partial SR information in the filtering process, we develop M permutation and colored permutation filters. These methods utilize the rank order information of specific spatial samples and allow ordering equivalences to be established in order to efficiently utilize the most important SR information in a given application. We extend the SR selection filtering framework to include augmented observation sets that may include functions of the observed samples.
CHAPTER 3:
SPATIALRANKORDER SELECTION FILTERS
71
Figure 3.1: Image Aerial broken into four quadrants: original (upper left), noisy (upper right), output of weighted sum filter operating on the sample spatial order (lower left), and output of selection filter operating jointly on the sample SR orderings (lower right). Each of the filtering methods discussed operates under the same basic principle, selecting an input sample to be the output based on partial, full, or extended SR ordering information. A unified optimization procedure is therefore developed in Sec. 3.4. Results of applying the developed SR selection filters to several image processing problems, including noise removal in single frames and video sequences, edge sharpening, and interpolation, are presented in Sec. 3.5. Finally, extensions based on fuzzy logic, which lead to fuzzy SR relations and fuzzy order statistics, are given in Sec. 3.6, and possible future research directions are discussed.
72
3.2
KENNETH E. BARNER AND RUSSELL C. HARDIE
Selection Filters and SpatialRank Ordering
3.2.1 ML Estimation To motivate the development of theoretically sound signal processing methods, consider first the modeling of observation samples. In all but trivial cases, nondeterministic methods must be used. Since most signals have random components, probability based models form a powerful set of modeling methods. Accordingly, signal processing methods have deep roots in statistical estimation theory. Consider a set of N observation samples. In most image processing applications, these are the pixel values observed from a moving window centered at some position n = In:, n2] in the image. Such samples will be denoted as x(n) = [Xx ( n ) , x 2 ( n ) , . . . , XN(n)IT. For notational convenience, we will drop the index n unless necessary for clarity. Assume now that we model these samples as independent and identically distributed (i.i.d.) random variables. Each observation sample is then characterized by the common probability density function (pdf) f~(x), where /~ is the mean, or location, of the distribution. Often ~ is information carrying and unknown, and thus must be estimated. The m a x i m u m likelihood estimate of the location is achieved by maximizing, with respect to/~, the probability of observing Xx, x 2 , . . . , XN. For i.i.d, samples, this results in N
/} = a r g m a x II ft~(xi )"
(3.1)
i=1
Thus, the value of 18 that maximizes the product of the pdfs constitutes the ML estimate. The degree to which the ML estimate accurately represents the location is dependent, to a large extent, on how accurately the model distribution represents the true distribution of the observation process. To allow for a wide range of sample distributions, we can generalize the commonly assumed Gaussian distribution by allowing the exponential rate of tail decay to be a free parameter. This results in the generalized Gaussian density function,
f ~(x) = ce (Ix~l/cr)p,
(3.2)
where p governs the rate of tail decay, c = p / [ 2 off (1 / p) ], and F (.) is the gamma function. This includes the standard Gaussian distribution as a special case (p = 2). For p < 2, the tails decay more slowly than in the Gaussian case, resulting in a heaviertailed distribution. Of particular interest is the case p = 1, which yields the double exponential, or Laplacian, distribution, 1 Ix~llo" . f~(x) = ~~e
(3.3)
To illustrate the effect of p, consider the modeling of image samples within a local window. Figure 3.2 shows the distribution of samples about the 3x3 neighborhood mean for the image Lena (Fig. 3.4a), along with the Gaussian (p = 2) and
73
CHAPTER 3: SPATIALRANK ORDER SELECTION FILTERS
0.12
........ 0.1
0.08
Image distribution Gaussian model Laplacian model
_
_
0.06
0.04
0.02
0 50
40
30
20
 10
0
10
20
30
40
50
Figure 3.2" Distribution of local samples in the image Lena (Fig. 3.4a) and the generalized Gaussian distribution models for p = 2 (standard Gaussian distribution) and p = 1 (Laplacian distribution). Laplacian (p = 1) approximations. As the figure shows, the Laplacian distribution models the image samples more accurately than the Gaussian distribution. Moreover, the heavy tails of the Laplacian distribution are well suited to modeling the impulse noise often observed in images. The ML criteria can be applied to optimally estimate the location of a set of N samples distributed according to the generalized Gaussian distribution, yielding N
N
 arg rr~n i=1 ~ I x i  / ~ l p.
/~ = a r g m a x i=111c e  ( l x  f i l / ~
(3.4)
Determining the ML estimate is thus equivalent to minimizing N
Gp(fi)
= ~. Ixi /3] p
(3.5)
i=1
with respect to 3. For the Gaussian case (p = 2), this reduces to the sample mean, or average: 1 N = argrr~nG2(3) = ~ ~. x,. (3.6) i=1
74
KENNETH E. BARNER AND RUSSELL C. HARDIE
A much more robust estimator is realized if the underlying sample distribution is taken to be the heavytailed Laplacian distribution (p = 1). In this case, the ML estimator of location is given by the value fi that minimizes the sum of least absolute deviations, N
G1 (/~) = ~. Ixi  ill,
(3.7)
i=1
which can easily be shown to be the sample median: ^
= arg mjn Gx ( ~ )  median[x1, x2, .... XN].
(3.8)
The sample mean and median thus play analogous roles in location estimation: While the mean is associated with the Gaussian distribution, the median is related to the Laplacian distribution, which has heavier tails and provides a better image and impulse process model. Although the median is a robust estimator that possesses many optimality properties, the performance of the median filter is limited by the fact that it is spatially blind. That is, all observation samples are treated equally regardless of their location within the observation window. This limitation is a direct result of the i.i.d, assumption made in the filter development. A much richer class of filters is realized if this assumption is relaxed to the case of independent but not identically distributed samples. Consider the generalized Gaussian distribution case in which the observation samples have a common location parameter/~ but where each x i has a (possibly) unique scale parameter r Incorporating the unique scale parameters into the ML criteria yields a location estimate given by the value of fl that minimizes N
Gp(~) =
1
(3.9)
~ . o .    K I x i  ~ l p. i=l
In the special case of the standard Gaussian distribution (p = 2), the ML estimate reduces to the normalized weighted average N 1 Y~N=IW i ' X i /~ = a r g m i n ~~   w ( x i  fi)2 _ N 9
O'~
~.i=l
Wi
'
(3 10)
where wi = 1 / cry > 0. In the heaviertailed Laplacian distribution special case (p = 1), the ML estimate reduces to the weighted median (WM), originally introduced over a hundred years ago by Edgemore [Edg87] and defined as N 1 = argmin" ~ "   I x i  / ~ 1 o'i
= MEDIAN[Wl*XI,W2*x2,..
,WN*XN],
(3.11)
9
where w~ = 1/o~ > 0 and the diamond 9 is the replication operator, defined as Wi~Xi
 X i , X i , 9 9 9, X i 9 w~ times
CHAPTER 3:
SPATIALRANK ORDER SELECTION FILTERS
75
More complete discussions of weighted median filters and the related class of weighted order statistic filters are given in Chapters 1 and 2. Two important observations can be made about the median filter and the more general weighted median filter: 1. Selection typeThe cost functions leading to the median and WM filters are piecewise linear and convex. Their output is thus guaranteed to be one of the observation samples Xl, x2 . . . . , XN. That is, the filters are selection type, choosing one of the observed samples as the output. 2. Partial spatialrank order useThe median filter is spatial order blind. The WM filter, in contrast, utilizes partial spatial order information by weighting samples based on their spatial location in a process that can be interpreted as attempting to exploit spatial correlations between samples. Both filters utilize partial rank order information by selecting the central ranked sample from the observation set, or from the expanded observation set in the case of the WM filter. These concepts are extended in the following subsections to yield a general class of filters that can be employed in a wide array of applications. Specifically, we define the general class of selection filters. This is an extremely broad class of filters, whose only restriction is that the filter output at each instant be one of the observed samples. The decision as to which input sample to select as the output is generally based on some feature that is a function of the observation samples. We show that the spatial and rank ordering information of the observed samples is a particularly useful feature that can be used in the selection rule. This leads to the broad class of spatialrank ordering selection filters, which are the focus of this chapter.
3.2.2
Selection Filters
Selection filters constitute a large filtering class in which the only restriction placed upon a filter is that its output, at each pixel location, be one of the current observation samples. A selection filter F can thus be thought of as a mapping from the observation set to an output that belongs to the observation set. Let the observation samples from a moving window be denoted, as defined previously, by x. The selection filter function is then given by y = F(x) = Xs(z),
(3.12)
where z is a feature vector, derived from the observation set and lying in the feature space ~, and S(z) is the selection function. The selection function determines which sample to select as the output at each window location, based strictly on z ~ Y]. Thus, the selection filter function can be expressed as F: { X I , X 2 , . . . , X N }
~'~ y ~ { X l , X 2 . . . . , X N }
(3.13)
76
KENNETH E. BARNER AND RUSSELL C. HARDIE
Figure 3.3: In the selection filter operation, a feature is derived from the samples in the current observation window and a selection rule operating on the feature selects one of the observed samples to be the current output. and the selection function can be written as S :f2 ~ {1,2 .... ,N}.
(3.14)
The selection rule S effectively partitions the feature space ~ into N regions, with each region being associated with one of the observation samples. If the current feature lies in the ith partition, the filter output at that window location is simply the observation sample xi. This general selection filter operation is illustrated in Fig. 3.3. Selection filters, although a broad filtering class, do limit the output to be one of the N observation samples. A natural question is, therefore, what effect does this restriction have on the filtering process? Is performance significantly limited by the selection constraint? To gauge the effect the selection constraint has on performance, we consider the following image filtering examples. To investigate an upper bound on performance that the selection constraint imposes, consider the best results that can be achieved by a selection filter in a noise smoothing application. In a simple realization of this case, an observed image is the superposition of a desired underlying image and an additive noise process, X = D + .~f, where X, D, and ~ f represent the observed image, desired image, and corrupting noise process, respectively. Figures 3.4a and 3.4b show the original 8bit grayscale image Lena and a corrupted observation, respectively. The corrupting additive noise process in this case follows a contaminated Gaussian distribution. As the figure shows, the corrupted image has a rather low signaltonoise ratio (SNR). Consider now the optimal selection filtering of the corrupted image. Let the optimal selection filter perform the following operation: at each window location, select as the output the observation sample closest in value to the original image pixel. Figures 3.4c and 3.4d show the results of performing this operation utilizing 3 x3 and 5 x 5 observation windows. The mean squared error (MSE) and mean absolute error (MAE) are indicated in the figure caption. Even for a low SNR
CHAPTER 3:
SPATIALRANK ORDER SELECTION FILTERS
77
Figure 3.4: Optimal selection filter examples: (a) original image Lena, (b) observation image corrupted by contaminated Gaussian noise, and optimal selection filter outputs for (c) 3x3 (MSE = 22.5, MAE = 3.4) and (d) 5x5 (MSE = 4.1, MAE = 1.4) observation windows. observation and relatively small 3 x3 observation window, the resulting output is very close to the original. In the 5 x 5 observation window case, the output is virtually indistinguishable from the original. Although implementation of the optimal selection filter requires knowledge of the original image, the results indicate that filter performance is not significantly limited by the selection constraint. The selection constraint, in fact, often improves filter performance. This is particularly true when results are judged subjectively. To illustrate this, we consider the filtering of a color image corrupted by impulses. Here the red, green, and blue tristimulus images are independently corrupted by impulse noise. Figures 3.5a
78
KENNETH E. BARNER AND RUSSELL C. HARDIE
and 3.5b (also see color insert) show the original color image Balloon and an impulse corrupted observation, respectively. To reduce the power of noise, simple averaging is often (mistakenly) applied to a corrupted image. The result of applying a vector averaging operation over a 5 x 5 moving spatial window is shown in Fig. 3.5c. The averaging operation, while reducing the power of the noise, has the disturbing effect of not only smoothing edges but also introducing new colors into the image. The introduction of colors not present in the original or observation images is often perceptually disturbing and should be avoided. Simply by applying the selection constraint to the filtering operation, much of the objectionable color introduction can be avoided. Figure 3.5d shows the result of applying the selection constraint to the averaging operation. That is, for each observation window, the tristimulus observation vector closest in Euclidean distance to the vector mean is selected as the output. The output realized by applying this constraint is free of color insertions and is therefore subjectively more appealing. As this example illustrates, even for extremely simple filter formulations, the selection constraint can play a valuable role.
3.2.3 SpatialRank Ordering Selecting the appropriate decision feature is the key to defining a selection filter that can be applied in numerous applications and produces desirable results across a broad range of problems. The relationship between the spatial ordering and rank ordering of observed samples defines a large feature space that is particularly useful in numerous image processing applications. Each of these natural orderings contains valuable information. For example, rank ordering is particularly valuable in the design of robust operators, while spatial ordering is helpful when spatial correlations are to be exploited. Operating jointly on the ordering information allows for the design of robust operators that effectively exploit spatial correlations. To formally relate the spatial ordering and rank ordering of samples in an image processing application, we consider again the typical case in which an observation window passes over an image in a predefined scanning pattern. At each location n in the image, the observation window covers N samples, which can be indexed according to their spatial location and written in vector form, xe(n) = [x1 (n), x2 (n),
. . .
,XN(n)]
(3.15)
T .
The subscript e has now been added to indicate that the samples are indexed according to their natural spatial order within the observation image. A second natural ordering of the observed samples is rank order, in which case the order statistics of the observation samples are obtained, X(1) (n) < x ( 2 ) ( n ) < 9 9 9
re(n),
di (n) = { ri (n)  x (n) x (n)  r9i (n)
for/= 1,...,4. The rankordered differences provide information about the likelihood of corruption for the current pixel. For example, consider the rankordered difference dx (n): If this value is positive, then the current pixel x ( n ) is either the smallest or largest value in the current window. If dx (n) is not only positive but also large, then an impulse is very likely. Together, the differences dl (n) through d4 (n) reveal even more information about the presence of a corrupted pixeleven for the case when multiple impulses are present in the current window. For images corrupted with only positive impulse noise, di(n) should be redefined simply as di (n) = x (n)  rgi (n), i = 1 .... ,4. Similarly, if only negative impulses are present, di (n) is redefined as
di(n) = ri (n)  x (n),
4.4
i = 1 , . . . , 4.
The SDROM Filter
Figure 4.1 shows a block diagram of the SDROM filter structure. The purpose of the impulse noise detector is to determine whether it believes the current pixel is corrupted. If a signal sample is detected as a corrupted sample, it is replaced with an estimation of the true value, based on the order statistics of the remaining pixels in the window; otherwise, it is kept unchanged. The filter operates as follows:
Impulse Noise Detection. The algorithm detects x ( n ) as a noisy sample if any of the following inequalities are true: d~(n) > T~,
i = 1 , . . . , 4,
where Tx, T2, T3, T4 are threshold values, with T1 < T2 < T3 < T4.
(4.1)
CHAPTER 4:
SDROMFILTER
115
Estimation ofthe True Value. If x(n) is detected as a corrupted sample, it is replaced by the ROM filter output re(n); otherwise, it is kept unchanged.
The ROM filter introduced in the method provides improved restoration performance compared to the conventional median filter, particularly for images corrupted with very high percentages of fixedvalued noise such as saltandpepper noise. The algorithm, as described above, is conditioned both on the rankordered differences and the threshold values, so we will also refer to it as the "thresholded SDROM." Computer simulations using a large variety of test images have shown that good results are obtained using thresholds selected from the following set of values: T1 ~ {4,8,12}, T2 ~ {15,25}, T3 = 40, T4 = 50. The algorithm performs well even for suboptimally selected thresholds. In fact, the default values T1 = 8 , 72= 20, T3 = 40, T4= 50 appear to provide satisfactory results with most natural images corrupted with randomvalued impulse noise. In general, the experimental results indicate that there is little need to consider threshold values outside the following intervals: T1 _< 15, 15 < T2 < 25, 3 0 < T3 < 50, 4 0 < T4 < 60. We have found that the selection of the thresholds, although done experimentally, is not a laborious process. Due to the robustness of the algorithm, good results are usually obtained in just one to three trials. We note other switching schemes for impulse noise removal exist in the literature [Flo94, Kun84, Kim86, Mit94, Sun94]. In these approaches, the output is switched between an identity and a medianbased filter, and the decision rules are typically based on a single threshold of a locally computed statistic. These strategies tend to work well for large, fixedvalued impulses but poorly for randomvalued impulses, or vice versa. In contrast, the SDROM is applicable to any impulse noise type while still employing simple decision rules. In many cases, improved performance can be obtained if the method is implemented in a recursive fashion. In this case, the sliding window is redefined according to V(n) = [ y x (n),..., y4 ( n ) , w 5 ( n ) . . . . , w 8 ( n ) ], where Yi (n) corresponds to the filter output for each noisy input pixel wi (n). Although we have described the algorithm for the case of 2D signals, the method is general and applies to higher dimensional signals as well as to 1D signals [Cha98]. Other window sizes and shapes are possible. The procedure in the general case follows similar steps. To detect the impulse noise, we rank order the samples inside a window, excluding the current sample and compare the differences between the current sample and the ordered samples to thresholds. A corrupted sample is replaced with the ROM value.
116
EDUARDO ABREU
_1 "I
x(n)
Compute d(n)
Estimation [ (ROM filter) I
y (n)
Figure 4.2: The generalized SDROMfilter structure.
4.5
Generalized SDROM Method
As indicated in Fig. 4.1, the filtered output y ( n ) is switched between the input x(n) and the ROM value re(n). The switching operation is conditioned on the rankordered differences d(n) and fixed threshold values. Using the concept of fuzzy logic, the method can be generalized by redefining y ( n ) as a linear combination of x(n) and re(n)[Ara95]: y ( n ) = a ( d ( n ) ) x ( n ) + fi(d(n))m(n),
(4.2)
where the weighting coefficients a(d(n)) and fl (d(n)) are constrained so that their sum is normalized to 1, implying that /~(d(n))
= 1  a(d(n)).
The coefficients a (d(n)) and fl (d(n)) are conditioned only on the rankordered differences (there are no thresholds). We will also refer to this implementation of the SDROM as the "generalized SDROM." A block diagram is presented in Fig. 4.2. Note that the thresholded SDROM output is a special case of Eq. (4.2), in which a (d(n)) can take on only the values 0 and 1. Pictorially, the distinction between the two approaches can be observed by comparing Fig. 4.3 and 4.4 for the simplified case in which d(n) ~ R 2 where d(n) = [dx (n),d2(n)]. 3 Note that the region corresponding to di(n) > dj(n), i < j, is outside the domain of a(d(n)). The fuzzy nature of the algorithm allows greater flexibility with regards to the removal of highly nonstationary impulse noise. Unlike the thresholded SDROM, the generalized method can also effectively restore images corrupted with other noise types such as Gaussian noise and mixed Gaussian and impulse noise. The values of the weighting coefficients a(d(n)) are obtained by performing optimization using training data. The optimized values are stored in memory and the function a(d(n)) is implemented as a lookup table. One primary difficulty with this approach is that for d(n) ~ R 4 and 8bit images, the number of possible values of d(n) is very large, so the computational complexity and memory storage requirements associated with optimizing and implementing a(d(n)) can be very 3For the example of Fig. 4.4, the coefficients were trained to restore the Lena image corrupted with 20% randomvalued impulse noise using the leastsquares design methodology presented in Sec. 4.5.1.
CHAPTER 4: SDROMFILTER
117
Figure 4.3: The function o~(d(n)) for the thresholded SDROM with d(n) ~ R 2.
Figure 4.4: The function ec(d(n)) for the generalized SDROM with d(n) ~ R 2.
high. To overcome this problem, we simplify the m e t h o d by partitioning the R 4 vector space into M nonoverlapping regions Ai, i = 1 , . . . , M, and for each region we assign a constant value to ~x(d(n)). Figure 4.5 illustrates this idea for d(n) ~ R 2 and M = 16 2. We observe that the function c~(d(n)) shown in Fig. 4.5 is a piecewise constant approximation to the s m o o t h function of Fig. 4.4. In the thresholded case, M = 2. We restrict all region boundaries to be parallel to the coordinate axes so that each region Ai can be represented as a Cartesian p r o d u c t of four scalar regions. Each scalar region is defined by an interval of the form (qix, qi], where the qi's are decision levels that define a distinct partition on R. We will denote by c~i and ~i, respectively, the values of c~(d(n)) and fi(d(n)) associated with the region Ai: ai = a(d(n))[d(n)~Ai,
i= I,...,M
]~i = ]~(d(n))[d(n)~Ai,
i=l,...,M.
118
EDUARDO ABREU
1 0.5

0
~I
0 i00
dl (n)
0 i00
i00

300~"~~0
d2 (n)
Figure 4.5" The simplified function a(d(n)) for the generalized SDROMwith d(n) e R 2. Table 4.1: Scalar Partitions Used for Generalized SDROM (M = 64 = 1296). Partitions Shown for Each Scalar Dimension of d(n), with Tx = 8, T2 = 20, T3 = 40, and T4 = 50. Scalar
Partition
d~(n)
[ co, Tx
20, T~
5, Tx, T~ + 5, T 1 + 2 0 , oo]
d2(n)
[ co, T2
20, T2
5, T2, T 2 + 5 ,
d3(n)
[ co, T3
30, T3
10, T3, T3 + 10, T3 + 30, co]
d4(n)
[ co, T4
30, T4
10, T4, T4 + 10, T4 + 30, co]
T 2 + 2 0 , oo]
Equation (4.2) becomes y(n) = r
+ [Jim(n),
i : d(n) E Ai.
(4.3)
Experimental results for this partitioning strategy are presented in Sec. 4.6 using the scalar partitions shown in Table 4.1, although other partitions are possible. Interestingly, we have found in practice that the exact locations of the partitions has very little impact on the final restoration results as long as M is sufficiently large. This characteristic contrasts with the thresholded technique, wherein the values of the thresholds {Tk} do impact the final results and must be selected more judiciously. We now address the design of the weighting coefficients {ai} and {fii} for both nonrecursive and recursive implementations, using the normalization
fii= 1  o~i,
i = 1,... ,M.
(4.4)
For nonrecursive implementation, the algorithm operates on the pixels of the original noisy image only. In contrast, for the recursive case, pixels from the original image are systematically replaced by the output of previous filtering operations, and consequently, the sliding window w(n) can contain original and "restored" image pixels.
CHAPTER 4:
119
SDROM FILTER
4.5.1 LeastSquares Design Let v (n), x (n), and y (n) represent the pixel values of the training image, the noisecorrupted training image, and the filtered output image, respectively. Because the determination of d(n) is independent of the filtering action for nonrecursive implementation, it is possible to derive the weighting coefficients {ai} and {fii} that minimize the total squared error, given by Et
= E [Y (n) 
v (n) ]2
n
for a given collection of training data. This equation can be expanded as
Et = ~" { a (d(n)) x ( n ) + [ 1  a (d(n)) ] re(n)  v (n) }2 n M
= ~.
~.
[ a i x ( n ) + (1  a i ) m ( n )  v ( n ) ] 2 .
(4.5)
i=1 n:d(n) ~Ai
Note that the inner sum represents the contribution of each particular a~ to the total squared error, so the entire expression can be minimized by independently minimizing each of these terms, for i = 1 , . . . , M. Accordingly, we take the partial derivative of Et with respect to each a~ and arrive at ~Et ~iRi
=
~.
2 [x(n)

m(n)][aix(n)
+ (1  a i ) m ( n )  v ( n ) ] .
n:d(n)~Ai
Setting this term equal to zero and solving for a~ leads to a simple expression for the globally optimal weighting coefficients for each i, given by 
tX~ =
~
[x(n)m(n)][m(n)v(n)]
n:d(n)~Ai ~.
[x(n)  re(n)]2
(4.6)
n:d(n)~Ai
for/= 1,...,M. If the training data and noise model are representative, the algorithm is likely to perform well on actual noisy images. Moreover, experimental results indicate that the performance of the algorithm is extremely robust with respect to the types of images and percentage of impulse noise on which the algorithm is trained.
4.5.2 LeastMeanSquare Design Another possible design strategy is to use a variant of the leastmeansquare (LMS) algorithm to compute the weighting coefficients [Ara95, Hay91, Kot92, Pal90, Pit90]. In this case, the objective is to design the weighting coefficients iteratively while continuously imposing the normalization constraint described in Eq. (4.4). First, let e (n) = v (n)  y (n)
EDUARDO
120
ABREU
denote the instantaneous error at pixel location n. If we define h(n) = x(n)  re(n), then e 2 (n) is given by e2(n) = {v(n)  a ( d ( n ) ) x ( n )
 [1  a(d(n))] re(n)} 2
= [(v(n)  a(d(n))h(n) Taking the expected value given that d(n) e square error (MSE),
Ji
=
E
 re(n)] 2 .
Ai, w e
arrive at a conditional m e a n 
[e 2 (n) I d(n) ~ Ai],
from which the total MSE J can be computed as M
J= ~Ji. i=1
Similar to the leastsquares design in Eq. (4.5), the overall error can be decomposed into M terms when the algorithm is implemented nonrecursively because the determination of d(n) is disconnected from the filtering operation. Thus, we can easily minimize the total MSE by separately minimizing each Ji using M distinct LMS algorithms. To this end, we take the partial derivative of J with respect to ai: ~J
V J i = 9 a i = E [  2 v ( n ) h ( n ) + 2aih(n) 2 + 2h(n)m(n) I d(n) ~ Ai].
As with traditional LMS, a noisy estimate to the gradient can be computed by simply dropping the expectation operator to obtain 9Ji = 2 [ a i h ( n ) 2 + h(n)m(n)  v ( n ) h ( n ) ] . By using the noisy gradient and substituting x ( n )  re(n) back into h(n), we arrive at the following update procedure for the weighting coefficients a in + l = a in + gi [x(n)  m(n)] e(n),
(4.7)
~n+l = 1  a ni + l ,
(4.8)
for i = I,... ,M. In both equations, the vector 1 defines the movement of the sliding window w(n). Using the traditional assumptions for the derivation of LMS convergence [Hay9I], it can easily be shown that the following conditions are sufficient for steady state convergence of the algorithm in the mean and mean square: 1 0 0, Lp erosion is essentially an L_p mean filter [Pit86a]. The block diagram of an Lp dilation or erosion filter is shown in Fig. 5.2a. Filter block A represents the linear part of the filter, which has coefficients a j = a j / ~.jN 1 a j . Due to inequalities (5.5), we conclude that the output of the grayscale morphological erosion with a fiat structuring element is always smaller than that of the L p erosion, when a j = 1, j = 1 , . . . , N. Similarly, the output of the morphological dilation with a fiat structuring element is always greater than that of the L p dilation. For very large positive values of p, the outputs of the Lp erosion and Lp dilation converge to the outputs of the minimum and maximum operators, respectively. Thus, L p erosion and Lp dilation are "soft" morphological operators [Kuo93]. The counterparts of the grayscale opening and closing can be defined using the L p dilation and the L p erosion in the following ways: l
( ~ . ~N akxi paj j=X k=l
k
t~j Yi = Ep (Dp(xi)) = j=l ~.k=lN akxP_j_k
)l/p . (5.29)
It should be noted that the effective filter window length used in the opening/closing definitions is not N but N' = 2N  1. The Lp operators possess some interesting properties, which are listed below. Property I.
Duality:
D_p(Xi)
=
Ep(Xi),
(5.30)
Dp(x~ 1) = Epl(Xi),
(5.31)
Ep(x~ 1) = Dpl(xi),
(5.32)
Up (Ep(xi))

[Ep(Up(xi1))]
1
(5.33)
CONSTANTINE KOTROPOULOSet al.
144
Ep ( U p ( x i ) ) = [Up (Ep(xTtl))] 1
(5.34)
where all superscripts denote powers.
Property 2. If {xi} and {Yi} are two sequences with positive terms satisfying xi 0

O"k
~
'
i = 1 ,
N,
k = 0 1
,...,
,
(5 50) 9
.
Under the null hypothesis Ho the constant signal mo = Cro~/~/2, while under the alternative hypothesis H1 we have m x = o 1 ~ f  ~ / 2 . In the general case, the pixel values in the ultrasound Bmode image are correlated. In the following analysis, we treat the pixel values as independent RVs. The Bayes criterion [Van68] leads to the loglikelihood ratio test (LRT) [Kot92]: N
HI i= 1
H0
fln0_2NlnoO] ]cr~ = u
2Cr~cr 2
~r2
~r~ k
N
i=1
for cr2 > cr2
(5.51)
for~rl2 1, d > 1,
(5.57) (5.58)
where :/r(U, M) is the incomplete gamma function defined by u~l
~r(u,M) =~
xM
fJ0
M!
e x p (  x ) dx.
(5.59)
Let us now focus our attention on the estimation of parameter m in Eq. (5.46) when n is multiplicative noise independent of m, which is distributed as follows: r r N exp ( r r N 2 ) Pn ( N ) =   T 4
'
N > 0.
(5.60)
The conditional density function of the observations assuming m = M is given by [Pap84] 1 (X) Pxlm(XIM) = ~ P n , M > 0. (5.61) Let us suppose that we have a set of N observations. The m a x i m u m likelihood (ML) estimate of M maximizes the loglikelihood function [Kot92], that is,
N
x/ff
1 ~.X2"
mML T
Ni=x
(5.62)
It is seen that the ML estimator of the constant signal is the L2 mean scaled by the factor ,/if/2. Let rr(N) be the following polynomial in N:
(
F N+~ rr(N) ~
1)
x/N(N  1)! '
(5.63)
where F(. ) denotes the gamma function. The expected value and variance of the ML estimator as well as the mean squared estimation error are given by [Kot92] E{th} = rr(N)M, var{th} = [1  rr2(N)]M 2, E{(fft M) 2} = 211  rr(N)]M 2.
(5.64) (5.65) (5.66)
151
CHAPTER 5: NONLINEARMEAN FILTERS
Moreover, the asymptotic properties of the ML estimate hold [Van68]. In most practical cases, it is unrealistic to consider a constant signal hypothesis. Without any loss of generality, the following binary hypothesis problem is assumed: H1 :X = m ~ ,
Ho : x = n,
(5.67)
where m and n are RVs. Our aim is to perform detection and estimation based on this model. Since m is an RV, the conditional density of the observations under H1 is given by PxlH1 (XIH1) = f JX m
Pxlm,HI(XIM, H1)PmlHI(MIH1) dM,
(5.68)
where Xm is the domain of the RV m. We make the following assumptions: 1. n is a Rayleigh RV having unity expected value and variance (4  rr)/rr; that is, its pdf is given by Eq. (5.60). 2. The conditional density of the observations under H1 and with m known is given by: 1
Pxlm,HI(XIM, H1) = ~pn
(X)rrXexp(rrX2) ~ = 2M 2  4M 2
,
M>O.
(5.69)
3. The conditional density of m under HI is chosen in such a way that it represents a realistic model and offers mathematical tractability. A Maxwell density with parameter A fulfills both requirements; that is, M 2 exp(AM 2) K '
PmlH1 (MIHx) =
(5.70)
where K ensures that Eq. (5.70) is a pdf, K =
(5.71)
4A3/2"
By substitution of Eqs. (5.70) and (5.71) into Eq. (5.68) we obtain a gamma density:
PxlH1 (Xlna) = rrAX exp (  X ~ ) .
(5.72)
Such a result is very reasonable, since it is known that speckle can be modeled by a gamma density function [Pit90]. Based on N observations the LRT leads to N
~ N
H1 4
~. X 2  4
~. Xi >   ( 0  N l n 2 A ) = y".
i=1
i=1
(5.73)
< 1T H0
The maximum a posteriori (MAP) estimate of the signal m is given by [Kot92] rr YhMAP(X) 
N
~ ~.i=i
( N  1) +
2
Xi
( N  1) 2 + rrA ~'i=1X2 1/2
.
(5.74)
15 2
CONSTANTINE KOTROPOULOS et al.
Figure 5.6: (a) Simulation of a homogeneous piece of tissue with a circular lesion in the middle; the lesion/background amplitude is +3 dB and the number density of scatterers in the background and the lesion is 5000/cm 3. (b) Thresholded original image. (c) Gray level histograms of the pixels belonging to the lesion and to the background areas. It can be seen that for h = O, the MAP estimate of m reduces to the form of the ML estimate of the constant signal, that is, to an s m e a n filter. Indeed, N
ff~MAP(X;A = O) =  T
N
(5.75)
1 i=1
The L2 m e a n filter has been applied to b o t h simulated u l t r a s o u n d Bmode images and real ultrasonic images for speckle suppression. Simulated u l t r a s o u n d Bmode images are used to evaluate the performance of various filters in speckle suppression and to select parameters (such as filter length and thresholds) in the image processing task. Figures 5.6a and 5.7a are simulations of a h o m o g e n e o u s
CHAPTER 5:
NONLINEARMEAN FILTERS
15 3
piece of tissue (4x4 cm) with a circular lesion in the middle; the lesion has a diameter of 2 cm. These ultrasound Bmode images were produced and described by Verhoeven et al. [Ver91]. The lesion differs from the background in reflection strength (+3 dB). The background has a number density of scatterers of 5000/cm 3. The lesion has a number density of scatterers of either 5000/cm 3 (Fig. 5.6a) or 500/cm 3 (Fig. 5.7a). In the former case, there is no change in secondorder statistics between lesion and background; in the latter case, the lesion is characterized by a subRayleigh distribution. Both simulated images have dimensions 241 x 241 and resolution 6 bits/pixel. The gray level histograms of the pixels belonging to the lesion area and to the background are plotted in Figs. 5.6c and 5.7c. It can be seen that they are very similar to the Rayleigh pdf. Two types of rectangular filter windows were employed, one having dimensions proportional to the lateral and axial correlation sizes (15 x3) and the other having dimensions inversely proportional to them (3x15). We briefly assess the performance of filtering the original image by the s mean filter and thresholding the filtered image, using as figures of merit the area under the ROC in each case and the probability of detection PD for a threshold chosen so that the probability of false alarm PF ~ 10%. Figure 5.8a depicts the output of the 15 x3 s mean filter applied to the image shown in Fig. 5.6a; Fig. 5.8b shows the thresholded image. The probability of lesion detection in Fig. 5.8b is 3.26% higher than that measured in Fig. 5.6b, and the area under the ROC is 6.45% larger than that measured in Fig. 5.6b as well [Kot92]. Figure 5.9a depicts the output of the 3x15 s mean filter applied to the image shown in Fig. 5.6a; Fig. 5.9b shows the thresholded image. The probability of lesion detection in Fig. 5.9b is 28.532% higher than that measured in Fig. 5.7b; moreover, the area under the ROC is 17.89% larger than that measured in Fig. 5.7b [Kot92]. A representative real ultrasonic image of a liver recorded using a 3 MHz probe is shown in Fig. 5.10a and the output of the s m e a n filter of dimensions 5x5 is shown in Fig. 5.10b. It is seen that the proposed nonlinear filter suppresses the speckle noise effectively. However, any spatial filtering without adjusting its smoothing performance at each point of the image according to the local image content results in edge blurring. Better edge preservation is attained by the socalled signaladaptive filters [Pit90]. In the following, we address the design of signaladaptive s mean filters. Let us first revise the image formation model. For ultrasonic images in which the displayed image data have undergone excessive manipulation (e.g., logarithmic compression, low and highpass filtering, postprocessing, etc.), a realistic image formation model is the signaldependent one [Lou88]: x = m + mll2n,
(5.76)
where n is a zeromean Gaussian RV. It has been proven that the ML estimate of
154
CONSTANTINE KOTROPOULOSet al.
Figure 5.7: (a) Simulation of a homogeneous piece of tissue with a circular lesion in the middle; the lesion/background amplitude is +3 dB and the number density of scatterers in the background is 5000/cm 3 and in the lesion is 500/cm 3. (b) Thresholded original image. (c) Gray level histograms of the pixels belonging to the lesion and to the background areas. m = M is given by [Kot94a]
0"2 I 0"4 1 N mML =   y + s + y. i=1
(s.77)
which closely resembles the s mean filter given by Eq. (5.62). The results mentioned above led us to the design of signaladaptive maximum likelihood filters (i.e., s mean filters) both for multiplicative Rayleigh speckle and for signaldependent speckle [Kot94a]. The output of the signaladaptive maxim u m likelihood filter, that is, the estimate of the original image at (i, j), is
15 5
CHAPTER 5: NONLINEARMEAN FILTERS
Figure 5.8: (a) Output of the 15x3 L2 mean filter applied to Fig. 5.6a and (b) result of thresholding.
Figure 5.9: (a) Output of the 3x15 L2 mean filter applied to Fig. 5.7a and (b) result of thresholding.
rh(i,j)
= rhML(i,j) + f i ( i , j ) [ x ( i , j )
 rhML(i,j)],
(5.78)
where x (i, j) is the noisy observation at pixel (i, j) ThML(i, j) is the m a x i m u m likelihood estimate of re(i, j) based on the observations inside the filter window A, x ( i  k, j  l) ~ A fi(i, j) is a weighting factor, approximating the local SNR over the window A rh(i, j) is the signaladaptive filter output at pixel (i, j). When fi(i, j) approaches 1, the actual observation is preserved by the suppression of the lowpass component roME(i, j); when it is close to O, m a x i m u m noise reduction is performed since the highfrequency component is suppressed. Analytic expressions for fl (i, j) for the multiplicative and the signaldependent models can be found elsewhere [Kot94a]. Figure 5.10c depicts the output of the signaladaptive L2 mean filter applied to the ultrasonic image of a liver of Fig. 5.10a. It is seen that not only edges but additional diagnostically significant image details are preserved.
156
CONSTANTINE KOTROPOULOS et aL
Figure 5.10: (a) Real ultrasonic image of a liver recorded using a 3 MHz probe. (b) Output of the 5x5 s mean filter. (c) Output of the signaladaptive s m e a n filter. (d) Zoomin of the results obtained by using segmentation in conjunction with filtering. A modification of the signaladaptive maximum likelihood filter that utilizes segmentation information obtained prior to the filtering process has been proposed as well [Kot94a]. Moreover, the segmentation of ultrasonic images by using a variant of the learning vector quantizer (LVQ) based on t h e / : 2 mean, the socalled s LVQ, was developed, and its convergence properties in the mean and in the mean square were studied [Kot94a]. Such a learning vector quantizer based on the 1:2 mean has been created using 49 neurons at the first layer, corresponding to input patterns taken from a block of 7x 7 pixels. The second layer consists of 2 to 8 neurons, corresponding to the output classes. A 7 x 7 window scans the image in a random manner to feed the network with input training patterns. During the recall phase, the 7x 7 window scans the entire image to classify each pixel into one of pmany (p = 2, .... 8) classes. A parametric image is created containing the class membership of each pixel. The ability of the 1:2 LVQ to perform segmentation is shown in Fig. 5.11. Figure 5.11a illustrates the classification performed by the 1:2 LVQ on the simulated image; two output classes were used, representing background and lesion. Figure 5.11b illustrates the segmentation of a real ultrasonic image of a liver into six classes by using the 1:2 LVQ; each class is shown as a distinct region having a gray value ranging from black to white. The more white a region is, the more important is; in general, important regions are blood
CHAPTER 5:
NONLINEAR MEAN FILTERS
15 7
Figure 5.11" Segmentation of (a) a simulated ultrasound Bmode image and (b) Real ultrasonic image of a liver using the s LVQ neural network. vessel boundaries, strong reflectors, etc. that should be preserved for diagnostic purposes. Regions having rich texture are shown as light gray; in these regions, a tradeoff between speckle suppression and texture preservation should occur by limiting the maximal filter window size. Image regions in which speckle dominates are shown dark; here, speckle should be efficiently suppressed by allowing the filter window to reach its maximum size. The result of the overall filtering process using the modified signaladaptive s mean filter that utilizes the segmentation information provided by the s LVQ is demonstrated in Fig. 5.10d. It is seen that combining segmentation with filtering better preserves the edge information as well as acknowledging image areas containing valuable information that should not be filtered.
5.7 Sorting Networks Using Lp Mean Comparators In many areas of signal processing there is a strong need for fast sorting algorithms and structures [Pit90]. A solution exists in the form of sorting networks, a special case of sorting algorithms, where the sequence of comparisons performed has homogeneous structure [Knu73]. The basic functional unit of a sorting network is the comparator; the network performance depends mostly on the performance of the type of comparator utilized. In this section a new type of comparator, the s comparator, is proposed that can be implemented using analog circuitry, that is, adders, multipliers, and nonlinear amplifiers that raise the input signal to a power. Thus, it can be used for highspeed analog or hybrid signal processing. The s comparator is based on the s mean. Since sorting networks provide a topology that is independent of the comparator being used, we will "replace" conventional comparators with s comparators and we will determine what additional modifications are needed so that the errors are within acceptable levels [Pap96b]. Figure 5.12c shows an s comparator. When p is large, s converges to the min operator and s to the max operator. If the network of Fig. 5.12a utilizes s comparators, then estimates ~(i) of the ordered input samples x(i) will be pro
158
CONSTANTINE KOTROPOULOS et al.
XX I X2

X3i
i
I i

i i
X5
!
(a)
i i
X(1)
Xl N~..~ Yl = m a x { x l , x 2 } X2
X(4)
= x(2)
Y2 = min{xl,X2} = X(1) 0o)
Xl .~.~ Yl = L p { X l , X 2 } = 2(2) X2
Y2 = s
= 2(1)
(c) Figure 5.12: (a) An oddeven transposition sorting network of N = 5 inputs. (b) A max/min comparator. (c) An Lp comparator. duced at the network outputs, where i = 1 , . . . , N. However, for small or m e d i u m values of p , L p comparators introduce errors. Let us first examine the approximation error introduced by a single L p c o m parator. We assume that the inputs Xl and x2 are i.i.d. RVs distributed in the interval (0,255) obeying some p d f (e.g., uniform). It can be shown that Eq. (5.79) holds for the e r r o r emax(Xl,X2) between x(i) and its Lp approximation 2(i): emax(Xl,X2) ~ m a x ( x l , x 2 )
for p >> 1,
1
~]X1 X21 = x x h ~ ( k 1k2, , kl,  k ~ ) d ( n l kl,n2  k ~ , n + 3 k l , m + k4). k l k2
Then we find that
and thus
where we have used the symmetry of the kernel, that is,
Using Eqs. (6.58), (6.59), and (6.60), we arrive at the corresponding extension to Eq. (6.52) as
Assuming that H2(ejwl, ejw2,eJO,ejO)has hghpass characteristics in either the w 1 or L I ) ~direction or in both, we conclude that the class I1 twodimensional systems can be approximated by meanweighted hghpass filters. The proof of Eq. (6.63) follows the same idea as the proof of Theorem 6.2 in the appendix of this chapter. Basically, we use the general reconstruction formula for , = twodimensional quadratic systems in Eq. (6.47) and substitute x ( e J w lejw2) g ( e j w l ,eJw2)+ 4n2px6(w1,w 2 ) ,where px represents the mean of x(nl,n2). Using Hz(eJo,eJo,eJo,ejo) = 0, we obtain
Then, we show that ?/{fi(eJwl,ejw2)}contributes much less to the overall result and thus can be neglected. Resubstituting x(ejwl,eJw2)finally yields Eq. (6.63).
6.7 LeastSquares Design of Edge Extracting Filters 6.7.1
Characterization of Edge Extracting Filters
We have developed the theory behind the class of filters that we want to use for edge extraction. We know their structure and properties in both space and frequency domains, and we also have justified why, at least in principle, this famdy would be advantageous for image enhancement. In the next step, we need to develop a performance measure that somehow allows us to find the best filter from this class [ThugSa]. Even though essentially all meanweighted highpass filters would extract edges of an image in some way, there are still important differences that we have to take into account. For instance, a problem that would not occur in onedimensional signal processing but is of importance in image processing is isotropy. The filter should find edges independent of their orientation. Horizontal or vertical boundaries must lead to the same response, and therefore the degree to which a filter is isotropic is critical for the resulting image quality. In this section, we introduce a method of characterizing the dependence of the filter output on the input frequency and the orientation of the input. We use rotated sinusoids with rotation and frequency as free parameters:
Figure 6.7: Parameters a and b determine the orientation of the input sinusoid; they are interlinked by a* + b2 = 1. (Reproducedwith permission from [Thu96c]. O 1996 IEEE.)
where a and b determine the orientation and a2+ b2 = 1. In Fig. 6.7, the possible orientations are indicated with the corresponding values for these parameters. Both of them must be in the interval [I, 11, and because they are interlinked, only one of them is sufficient to define the orientation. Also, for our purposes half of the possible directions in Fig. 6.7 are redundant, so we will only use
With Eqs. (6.64) and (6.47),we compute the output spectrum. We find
and
As expected, the output consists of components with zero (dc) and twice the input frequency. We can easily show, however, that only the dc component will be
nonzero. We use Eqs. (6.61) and (6.58) and obtain
Thus, only the terms with 6 ( w l ,w2) remain in Eq. (6.65). This is not too surprising since the class I1 twodimensional filters were derived as an extension of their onedimensional counterparts in Sec. 6.6.1, and for those we have already shown this property. Thus, we write
with
6.7.2
Basis Filters
By combining the second half of Eq. (6.18) with Eq. (6.58),we obtain the inputoutput relation for class I1 filters:
with CklCk2h2( k l , k2, kl,  k2) = 0. Obviously, any filter from this class is a linear combination of terms like x(n1  kl, n2  k2)x(nl + kl, n2 + k2). This is based on the property of Volterra filters that they are linear in the kernels. We make use of this observation and introduce the concept of basis filters for the design of Volterra filters. We define an expression
as a basis filter for class I1 filters, where 0 I kl < m, m < k2 < m , and kl + I k2I * 0 (i.e., kl and k2 cannot be equal to zero at the same time). We consider only unique pairs of kl and k2. If we would also permit kl < 0, then some of the filters would be identical, since y k l , k 2 ( n l ,n2) = y  k r ,  k 2 ( n l ,n 2 ) . Using Eq. (6.70),we find the expression equivalent to Eq. (6.69) as
where the ekl,k2 represent the design coefficients. We proof this equivalence by rewriting Eq. (6.69) as
with the restrictions for kl and k2 as before and where we have substituted eklYk2 =  h2(kl, k2,kl, k2). Therefore, the set of filters in Eq. (6.70) completely describes all possible filters from class 11, and it is not redundant since none of the filters in Eq. (6.70) can be written in terms of any other from this set. Furthermore, the basis filters have the advantage that for each of them the sum of their coefficients equals zero, and thus Eq. (6.53) holds by design and the coefficients e k l , k 2 can have arbitrary values. The relation between the kernel coefficients and the e k l , k z is
LO
for nl < 0. This definition generates a nonsymmetric kernel that directly represents the system we need to implement. In the frequency domain, the linear dependence on the basis filters is preserved by the Fourier transform. Since we know the frequency characteristics of each of the basis filters, we must find the coefficients in such a way that the overall system has a certain desired frequency response. We are interested in designing class I1 systems that operate on a relatively small neighborhood. This keeps the computational complexity low for both design and implementation. We choose a 5x 5 region of support, since it is a reasonable compromise between the computational complexity of the filter and the degree of freedom for the design. We obtain a total of 12 basis filters for 2 s kz s 2 and 0 I kl I 2 in Eq. (6.70). We find their frequency responses ~ j ~(ejwl ' , eJw2, ~ ~ejw3, ~ ejw4)as
)
6.7.3
LeastSquares Design
For each of the basis filters, we obtain the response to a sinusoidal input according to Eq. (6.68). For simplicity, we denote them Bkl,k2(wo,a)instead of dc,(k13k2)(ejwO, a). y2
m.
where b = Then we sample these continuous functions in 1 s a I 1 and 0 I wo r n / 2 . The reason that we do not use frequencies beyond 7 ~ / 2is that they are of negligible importance in realworld images since images tend to be oversampled. We choose a sample spacing of A a = 0.05 and Awo = ~ 1 5 0 , which yields 201x26 samples. We order the samples into column vectors in the following manner. The frrst 201 elements are for coo = 0 and from a = 1 up to a = 1. After that, we take the 201 samples for wo = 7r/S0 and so on until we I in the order reach wo = 7r/2 and a = 1. We number the functions Bkl,k2( w Oa) given before from 1 to 12 and obtain vectors bl through b12,which we combine into the matrix B: B = (blb2 ... b12). (6.74)
Table 6.2: Parameter yectors for Design of TwpDimensional Version of the Teager Filter; eTis the exact filter, 0 3 , the scaled filter, and 0, the approximate solution.
We write the linear combination of the basis filters with coefficients matrix product B0, where
as a
Thus, we can express the design problem as the linear matrix equation
The desired behavior of the ideal filter is contained in d, which we sample in the same fashion as described above for B k l , k 2 (coo,a),and e represents the error vector. We find the optimal solution that minimizes the error in the leastsquares sense IMen871:
e
6.7.4 TwoDimensional Teager filter We want to design a twodimensional version of the Teager filter defined in Eq. (6.27). In Eq. (6.28), we stated that the onedimensional filter responds with an output y (n) = sin2(coo) if the input is a sinusoid x(n) = sin(won). Now we need to find a twodimensional system that outputs a constant signal y (nl,n2 ) = sin2(coo)for a sinusoidal input and that is, of course, independent of the orientation of the excitation. We set d(cuo,a) = sin2(coo) and convert d(wo,a) into the column vector d. Using Eq. (6.77),yields the parameter vector 8~ given in Table 6.2. If we relax the
Figure 6.8: Characteristic functions as defined in Eq. (6.68) for (a) the twodimensional Teager filter and (b)its approximation; (c) maximum deviation of the functions from the desired curve for all values of a. (Parts a and b reproduced with permission from [Thu96c]. O 1996 IEEE.)
restrictions somewhat and allow the filter output to be scaled, that is, d ( w o ,a) = a sin2(wo),we can normalize the parameters such that B ( i ) = 3; this ylelds Bj in Table 6.2. Figure 6.8a shows the characteristic function yidC' (ejWo,a ) for this filter. The design goal of virtually perfectly isotropical behavior has been achieved up to wo = n / 2 . The advantage of normalizing the parameter vector is that the elements of B3 can easily be approximated by simple numbers. The rightmost column in Table 6.2 shows the approximation 8, using only the coefficients 1 and 0.5. Thus, we obtain the inputoutput relationship for the approximate twodimensional Teager filter:
xi
Figure 6.9: (a)Original image, (b) enhanced image after unsharp masking, (c)edge image obtained by applying the Volterra filter in Eq. (6.78) to the image in (a),and (d) output of
the Laplacian filter. This equation is considerably simpler to implement than the full description in 8 ~At. the same time, the behavior of this system, as shown in Figs. 6.8b and 6.8c, does not deviate too much from the ideal one. It is still extremely isotropic for almost the entire range of interest from 0 to 1 ~ 1 2 .
6.7.5
Image Enhancement Example
Figure 6.9 shows an example of edge enhancement. The original image in Fig. 6.9a is somewhat blurry. It was enhanced using the unsharp masking technique and the approximate 2D Teager filter in Eq. (6.78); the enhanced image, Fig. 6.9b, is noticeably sharper. The outputs of the Teager and Laplacian filters are shown in Figs. 6 . 9 ~and 6.9d, respectively. The Laplacian filter shows a uniform response to edges independent of background intensity, whereas the Teager filter output is weaker in darker regions (e.g., the darker areas of the roof) and stronger in brighter areas (e.g., the bright wall), as expected.

xi:ptiyc intcrpolarion
thinning

Figure 6.10: (a) Block diagram of edge enhanced image interpolation, and images zoomed by (b)bilinear interpolation and (c) edge enhanced techruque. (Reproduced with permission from [Thu96a]. 01996 SPIE.)
6.7.6 Application in lmage Interpolation The edge enhancement technique described in the previous section can be extended to image interpolation [Thu96a]. As shown in the block diagram in Fig. 6.10a, both the original image and the edge image created by the Volterra filter in Eq. (6.78) are upsampled and interpolated. The enlarged edge image is then processed by the "line thinning" block, which reduces the width of the extracted edges. After a final scaling operation, the edge information is added back to the zoomed image. The interpolation block adapts to the local image characteristics and processes different image regions in different ways. In particular, edges are interpolated only along their direction and not across, whlch preserves sharpness. The results of standard bilinear interpolation and this technique are compared in Figs. 6.10b and 6 . 1 0 ~ .
6.7.7 Application in lmage Halftoning Another application of the quadratic Volterra inEq. (6.78) canbe found in [Thu96b]. The technique of error diffusion is used to halftone an image, to requantize an image into fewer (usually two) levels of intensity or color whle minimizing the impact on perceptual quality. In the standard method, the quantization error is filtered
x(n,.n:)
quantizer
/'
diffusion filter
error imagc
/
nonlinear edge extraction
strength calculation
(a)
Figure 6.11: (a) Block diagram of modified error diffusion technique, (b) example of halftoning using standard error diffusion, and (c)result of modified error diffusion. (Reproduced with permission from [Thu96b]. @ 1996 SPIE.)
by a diffusion filter in a feedback loop. The modification is shown in Fig. 6.1la. The "strength calculation" block essentially converts the output of the Volterra filter to its absolute value. The result is then used to adapt the diffusion filter in such a way that the final image appears sharper. Examples allowing comparison of standard and modified error diffusion are shown in Figs. 6.1 1b and 6.11~.
6.8 Summary In this chapter, we have studied homogeneous quadratic Volterra filters and, in particular, the Teager filter and its extensions in one and two dimensions. We have presented an intuitive interpretation of the frequency response of these filters, which facilitates a better understanding of their properties. Certain subclasses of quadratic Volterra filters are natural extensions of the Teager filter. Twodimensional Teager filters have properties that are desirable for image enhancement since they can be approximated as meanweighted highpass filters. We have demonstrated their application in image sharpening, interpolation, and
halftoning. The framework presented here also provides a foundation for different filter designs (e.g.,lowpass filters) and applications that require a dependence on the local image intensity.
6.9 Appendix 6.9.1
Proof of Theorem 6.1
We start with the relationship
where Y2(ejwl,ejw2) = H(ejWl, eJw2)x(eJwl)~(ejw2) and D = (0,277) x (0,277). Figure 6.12 shows the integration region graphically. Instead of integrating parallel to the axes, we split the region into two parts, the lower and upper triangles. For each of these portions, we compute the integral separately, and at the end of the derivation we combine the results. As shown in Fig. 6.12, we integrate over the lower triangle along the line w2 = coo  wl first, where coo is a free parameter between 0 and 2rr, and then along the orthogonal direction indicated by the arrow:
nt(n) = 4772
IJ f i ( e J ~ l ,
e j ~) ie j n ' ~ ~ + ~ 2 '
wi dw2
lower
JrO
where we have used the abbreviation A(ejW0 ) = p2 (ejwl,ej(wowl) dwl. For the integral over the upper triangle, we substitute
The basic idea behind this is to integrate along the line w2 then in the orthogonal direction. This yields
=
2rr
+ wo  wl and
Figure 6.12: Integration process for the derivation of the output spectrum.
Since x ( e J w )and H(ejw1, eJw2) are Fourier transforms of real signals, we can use ~l), = the symmetry relationships ~ ( e j ( ~ " =~X* ) )(eJw)and ~ ( e j ( ~ "  ej(2nw2)) H* (ejwl,ejw2).We obtain
For this last step, we have substituted wo = 27r  v . Finally, we combine the results for the upper and lower parts:
Note that this is an inverse Fourier transform, which means that the expression inside the braces must be the desired expression for the output spectrum.
6.9.2
Proof of Theorem 6.2
With the general definition of homogeneous quadratic Volterra filters,
and px
= E(x) and
2 ( n ) = x ( n )  p x , we write
Assuming that H2(eJ0,ej0)= 0, then CklCkZh 2 ( k l ,k2) = 0. We decompose y(n) into a sum of yl ( n ) and y2 ( n ) ,where
To analyze the effects of both of these terms on the overall result, we assume that 2 ( n ) can be modeled as independent identically distributed (i.i.d.) noise with uniform distributionbetween A and A. Thus, E[k] = 0 and a: = ~ [ i ' ]= 1/3A2. We now show under which conditions a;, 2 a;, . Because yl ( n ) is a linearly filtered version of a zero mean signal, yl ( n ) is also zero mean. Therefore,
a;,
=
E[y:I
=
p : x x x x h z ( k l , k 2 ) h z ( k 3 , kl){E[x(n  k l ) x ( n  ks)] ki
k2 k 3 k 4
+E[x(n  k l ) x ( n  k4)]
Using the i.i.d. assumption for 2(n),the autocorrelation is given by E [ x ( n  cx)x(n  B)]
= Rx(ot  /?) =
0
forotffl,
+A2 for ot
=
B,
so the variance becomes
With the symmetry condition of h2 ( n l ,n 2 ) ,that is, h2(nl,n2)= hz ( n 2 ,n l ) ,
STEFAN THURNHOFER
Denoting the last two terms by sh and neglecting the factor 2 of the first sum, we approximate a;, by
For y2 ( n ) we find the expected value as
a2
=  C h 2 ( k ,k). k
Thus, the variance is
Using E [g4]= jfA24/ (2A)dx
=
a4/ 5, we rewrite the expression for o;2 as
Note that the expression inside the first sum of Eq. (6.80) is zero whenever one of the variables kl , .. .,k4 is different from all the others. If, for example, kl is different from k2, . . . , k4, then
which is zero. Therefore, we only have to consider the cases in which k l = k2 = k3 = k4 or when pairs of these variables are equal. This leads to the first four sums in Eq. (6.81). For the second of them, we write
and therefore
We combine Eqs. (6.79)and (6.82) and compute the ratio of the variances:
The last step is based on the assumption that sh 10. This means that whenever px 1 A, the contribution of y l ( n ) to the overall filter output is greater than that of y 2 (n),that is, u,, 2 a;,. Even if px is only slightly larger than A, the ratio will be significant. For example, assume that px = 2A; then u;, 1 120,~. Thus, the filter output can be approximated by the result of y l ( n ) :
and again with x (n)= 2 ( n ) + p,,
The corresponding expression in the frequency domain can easily be derived by starting from the general spectral inputoutput relationship,
and replacing x(eJwl)with R(eJwl)+ 2 n p X 6 (w l ) , which yields
ejw) and H Z(eJo,eJo) = 0, With the symmetry assumption Hz (ejw,eJO)= H Z(eJO, we resubstitute x ( e j w ) ,which yields the desired result:
References [Bi189a] S. A. Billings and K. M. Tsang. Spectral analysis for nonhear systems, Part I: Parametric nonlinear spectral analysis. Mechan. Syst. Signal Process. 3(4), 319339 (1989). [Bil89b] S. A. Billings and K. M. Tsang. Spectral analysis for nonlinear systems, Part 11: Interpretation of nonlinear frequency response functions. Mechan. Syst. Signal Process. 3(4),341359 (1989). [Bof86] K. R. Boff, L. Kaufman, and J. P. Thomas. Handbook of Perception and Human Performance, Vol. I: Sensory Processes and Perception. Wiley, New York (1986). [Chu79] L. 0 . Chua and C. Y. Ng. Frequency domain analysis of nonlinear systems: General theory. Electron. Circ. Syst 3, 165185 (July 1979). [Dun931 R. B. Dunn, T. F. Quatieri, and J. F. Kaiser. Detection of transient signals using the energy operator. In Proc. IEEE Intl. Conf:on Acoustics, Speech and Signal Processing, pp. 111145111148 (Minneapolis,MN, April 1993). [Jai89] A. K. Jain. Fundamentals of Digital Image Processing. PrenticeHall, Englewood Cliffs, NJ (1989). [KaiSO] J. F. Kaiser. On a simple algorithm to calculate the "energy" of a signal. In Proc. IEEE Intl. Conf on Acoustics, Speech and Signal Processing, pp. 381384 (Albuquerque,NM, April 1990). [Kai93] J. F. Kaiser. Some useful properties of the Teager's energy operator. In Proc. IEEE Intl. Conf: on Acoustics, Speech and Signal Processing, pp. 111149111152 (Minneapolis,MN, April 1993). [LimSO] J. S. Lim. TwoDimensional Signal and Image Processing. PrenticeHall,Englewood Cliffs, NJ (1990).
[Men871 J. M. Mendel. Lessons in Digital Estimation Theory. PrenticeHall, Englewood Cliffs, NJ (1987). [Mitgl] S. K. Mitra, H. Li, I. S. Lin, and T.H. Yu. A new class of nonlinear filters for image enhancement. In Proc. IEEE Intl. Conf:on Acoustics, Speech and Signal Processing, pp. 25252528 (Toronto, 1991). [Pic821 B. Picinbono. Quadratic filters. In Proc. IEEE Intl. Conf: on Acoustics, Speech and Signal Processing, pp. 298301 (1982). [Pit901 I. Pitas and A. N. Venetsanopoulos. Nonlinear Digital Filters. Kluwer, Boston, MA (1990). [Ram861 G. F. Ramponi. Edge extraction by a class of secondorder nonlinear filters. Electron. Lett. 22, 482484 (April 1986). [Ram881 G. F. Ramponi, G. L. Sicuranza, and W. Ukovich. A computational method for the design of 2D Volterra filters. IEEE Trans. Circ. Syst. 35, 10951 102 (September 1988). [Ram931 G. Ramponi and P. Fontanot. Enhancing document images with a quadratic filter. Signal Process. 33, 2 334 (July 1993). [Rog83] B. E. Rogowitz. The human visual system: A guide for the display technologist. In Proc. SID 24, 235252 (1983). [Rub691 M. L. Rubin and G. L. Walls. Fundamentals of Visual Science. Charles C. Thomas, Springfield, IL (1969). [Rug811 W. J. Rugh. Nonlinear System Theory: The Volterra/Ml'ienerApproach. Johns Hophns University Press, Baltimore, MD (1981). [Sic921 G. L. Sicuranza. Quadratic filters for signal processing. Proc. IEEE 80,12631285 (August 1992). [Sch80] M. Schetzen. The Volterra and Wiener Theories of Nonlinear Systems. Wiley, New York (1980). [Tea801 H. M. Teager. Some observations on oral air flow during phonation. IEEE Trans. Acoust. Speech and Signal Process. ASSP28, 599601 (October 1980). [Tea901 H. M. Teager and S. M. Teager. Evidence for nonlinear sound production mechanisms in the vocal tract. In Speech Production and Speech Modelling (W. J. Hardcastle and A. Marchal, eds.), pp. 241261. Kluwer (1990). [Thu95a] S. Thurnhofer and S. K. Mitra. Designing quadratic Volterra filters for n o d n e a r edge enhancement. In Proc. Intl. Conf: on Digital Signal Processing, pp. 320325 (Lirnassol, Cyprus, June 1995).
202
STEFANTHURNHOFER
[Thu95b] S. Thurnhofer and S. K. Mitra. Quadratic Volterra filters with meanweighted highpass characteristics.In Proc, IEEE Workshopon Nonlinear Signal and Image Processing, pp. 3683 71 (Halkidiki, Greece, June 1995). [Thu96a] S. Thurnhofer and S. K. Mitra. Edgeenhanced image zooming. Opt. Eng. 35, 18621869 (July 1996). [Thu96b] S. Thurnhofer and S. K. Mitra. Detailenhancederror diffusion. Opt. Eng. 35, 25922598 (September 1996). [Thu96c] S. Thurnhofer and S. K. Mitra. A general framework for quadratic Volterra filters for edge enhancement. IEEE Trans. Image Process. Special issue on nonlinear image processing. 5,950963 (June 1996). [Wan931 Y. Wang and S. I. Mitra. Image representation using block pattern models and its image processing applications.IEEE Trans.Pattern Anal. Machine Intell. 15, 321336 (April 1993). [Vin87] T. Vinh, T. Chouychai, H. Liu, and M. Djouder. Second order transfer function: Computation and physical interpretation. In Proc. SPIE International Modal Analysis Conf.,pp. 587592 (London, 1987). [Yu94] T.H. Yu and S. K. Mitra. Unsharp masking with nonlinear filters. In Proc. 7th European Signal Processing ConferenceEUSIPCO '94,pp. 1485 1488 (Edinburgh, Scotland, September 1994). [Zha93] H. Zhang. Frequency Domain Estimation and Analysis for Nonlinear Systems. Ph.D. thesis, University of Shefield, Shefield, England (1993).
Polynomial and Rational Operators for Image Processing and Analysis
Dipartimento di Elettrotecnica Elettronica Informatics Universita degli Studi di Trieste Trieste, Italy
7.1
Introduction
A wide class of nonlinear operators can be devised for image processing applica
tions, based on polynomial and rational functions of the pixels (luminanceor color) of an image. This chapter shows that this approach can be exploited successfully for image enhancement (contrast sharpening, edgepreserving noise smoothing), image analysis (texture segmentation, edge extraction),and image format conversion (interpolation). The attractive side of polynomial and rational expressions under a mathematical viewpoint is that they come as the most obvious extension to linear operators, which in turn are the first expertise required of a person working in the digital signal processing area. The properties of polynomialbased functions are well established, and they also enjoy the advantage of sharing some of these properties with linear operators (adaptive techniques can be devised in a similar way).
On the other hand, the most important obstacle to the diffusion of such operators is their computational complexity, which increases very rapidly with the degree of the nonlinearity and with the size of the data support involved. However, in most applications extremely simple operators, of degree at most three and operating on a very small support, can provide satisfactory results. This chapter is organized as follows: The formalisms of polynomial and rational filters are dealt with first, and then Sec. 7.2 provides an overview of the theoretical properties of operators that are based on the Volterra series and of their extensions to two dimensions. Section 7.3 indicates a set of possible applications of polynomial filters for contrast enhancement, texture segmentation,and edge extraction. Rational operators occupy the remainder of the chapter, with a number of applications such as noise smoothing, image interpolation and contrast enhancement discussed in Sec. 7.4. Finally, some conclusions are drawn in Sec. 7.5, which also indicates important issues remaining in the field.
7.2 Theoretical Survey of Polynomial and Rational Filters A thorough study of the theory of polynomial operators is far beyond the scope of this chapter, so only a few fundamental concepts are given to familiarize the reader with this approach. For a deeper understanding, the most important references are the book by Schetzen [Sch80] and the very recent book by Mathews and Sicuranza [MatOO].
7.2.1
Volterra Series Expansions for Discrete Nonlinear TimeInvariant Systems
A continuous nonlinear timeinvariant (NLTI)system without memory can be rep
resented by a Taylor series. Similarly, a continuous NLTI system with memory is described by the Volterra series [Vo187]. Through orthogonalization of the complete set of Volterra operators the Wiener theory is obtained [Sch80,Sch81,Wie581. Exploiting such mathematical tools in the digital signal processing arena requires a discrete truncated version of the Volterra series [Matgl, MatOO, Sic921. In the discrete case, the inputoutput relation of a Volterra system was given in Chapter 6 and is reported here for convenience:
The socalledVolterra kernels h, = h, (il,...,i,) are symmetric functions of their arguments. The term ho represents an offset component, hl (il) is the impulse response of a digital noncausal IIR filter, and h, (il,..., i,) is a generalized pthorder impulse response.
CHAPTER 7: POLYNOMIAL AND RATIONAL OPERATORS
7.2.2
Classes of Polynomial Filters
In the most general case, each term in the summation of Eq. (7.1)has the expression
In practice however, it is necessary to resort to more restricted classes of polynomial filters. By changing to 0 the lower limits of the summations in Eq. (7.2)we get a causal system, and if the upper limits are changed to a finite value N the system has a finite memory. In this case, hl (il) is an FIR filter; for the higherorder kernels, the effect of the nonlinearity on the output depends only on the present and a finite set of past input values. The truncated Volterra series of order P is obtained by replacing the infinity in the summation of Eq. (7.1)with the finite integer P; the class of truncated (or finitesupport, finitememory)Volterra filters results. If, however, an infinite memory is required while still avoiding the obvious problem of infinite summations, the class of finiteorder recursive Volterra filters may be employed. The latter is constituted of operators whose output is expressed as a function of input values and of previous output values as well [MatOO]:
The simplest recursive filter is the bilinear operator:
A bilinear operator has the advantage that it shares with the nonrecursive Volterra filter the property of being linear in its coefficients. On the other hand, like any recursive operator it can show unstable behavior [Lee94].
7.2.3
Properties of Polynomial Filters
Among the various properties of Volterra operators, a few are worth citing even in this simplified overview. First, we observe that the output of the system is linear with respect to the kernel coefficients; hence, an extension of the optimum filter theory is allowed, and adaptation algorithms can be devised. For fixedcoefficient filters, the identification of each kernel requires a set of p products of unit impulses conveniently placed on the filter support as input excitation. An example is the biimpulse response of a quadratic filter, described elsewhere [Ramgo]. Second, due to their nonlinearity, Volterra operators can be devised that have a peculiar property: the response to a change in the input signal is an increasing function of the local average of the signal. In terms of image processing applications, this means that we can obtain a larger response to luminance steps located in brighter zones of an image. This feature is in agreement with Weber's law and will be exploited in the following sections. Moreover, according to the amplitude
dependence, a threshold can be defined so that steps with amplitude above it are considered to be relevant details of the image and thus are amplified while steps below it are considered to be noise and thus are reduced in amplitude. A suitable threshold can be determined by trial and error procedures or by knowledge of the statistics of the noise. The resulting edgepreserving effect is exploited in the rational operator for noise smoothing described in Sec. 7.4.1. Finally, it was already shown in Chapter 6, Sec. 6.2, that the output of a Volterra system can be expressed by sums of multidimensional convolutions. This property permits the derivation of a frequency domain interpretation of a polynomial operator, which was introduced for a quadratic operator in Sec. 6.3. This interpretation will be used in Chapter 14 and has important implications here. Let us apply a pdimensional sampled complex sinusoid
as the input sequence to a causal linear filter described by the pdimensional impulse response h, (il, .. . ,i, ) . The output sequence is given by
w ( n l , .. ., n,)
..
...eJWpnp,
= HP(eJw1,., ejWp)ejwlnl
(7.5)
where
is the continuous frequency response of the pdimensional causal linear filter. The output of the Volterra operator of the pth order is then derived, by assumi n g v ( n l ,..., n p ) = u ( n l ) . .  u ( n p ) a n d n l = . .  = n , , = n ,
so that the frequency contributions at the output are recovered at the angular frequency cul + . . . + cop. Hence, it is immediately recognized that new frequencies appear at the output that are not present at the input; this property is the basis for the operation of the rational interpolators presented in Sec. 7.4.2. These interpolators exploit such new highfrequency components to yield sharper details than those obtainable by a linear operator.
7.2.4
Rational Filters
It should be noticed that the kernel complexity in each component of Eq. (7.1) is NP coefficients. Due to the symmetry property, the number of independent coefficients is expressed by the binomial factor
CHAPTER 7: POLYNOMIALAND RATIONAL OPERATORS
207
For example, if p = 2 and N = 9, NP = 81 and N p = 45; if p = 2 and N = 25, NP = 625 and N p = 325. This clearly shows one of the most important problems encountered when trylng to exploit the potentialities of the Volterra series: the number of parameters grows at a very fast rate with the degree of the nonlinearity and with the memory of the system. There are two different approaches to circumventing t h s problem, the recursive operators already mentioned and a new class of polynomialbased filters, the rational function operators. Rational functions (the ratio of two polynomials)were recently proposed to represent the inputoutput relation in a nonlinear signal processing system [Leu94]. One of the motivations for their introduction was to overcome another limitation typical of the more conventional polynomial approach, its poor ability to extrapolate beyond its domain of validity. Similar to a polynomial function, a rational function is a universal approximatorwith enough parameters and enough data to optimize them, it can approximate any function arbitrarily well. Moreover, it can achieve the desired level of accuracy with a lower complexity, and it possesses better extrapolation capabilities. It also has been demonstrated that a linear adaptive algorithm can be devised for determining the parameters of t h s structure; with this approach, the two problems of estimating the direction of arrival of plane waves on an array of sensors and of detecting radar targets in clutter were tackled [Leu94]. A rational function, used as a filter, can be expressed as
A rational function with a linear numerator and a h e a r denominator was used by Leung and Haykin [Leu94]. A major obstacle for these functions can again be their
complexity: if a high order is used, many parameters will be required, and this will cause slow convergence.
7.2.5
Extension to 2D and Multidimensional Discrete Systems
The complexity of the extension of polynomial and rational operators to the multidimensional case is in general extremely large. However, for applications in image and image sequence processing, often filters defined on very small supports yield remarkable results; moreover, a loworder filter is often sufficient to obtain significant improvements over conventional linear filters. The arrangement of the independent variables into suitable vectors is the basis of the representation of a multidimensional Volterra filter; to derive a compact notation, we must conveniently order the input data and the filter coefficients. For example, an expression for a quadratic 2D Volterra filter was given in Sec. 6.2.1. Analogous expressions can be derived for rational operators by separately manipulating the numerator and the denominator of their inputoutput equation.
7.3
Applications of Polynomial Filters
In the 1D case, polynomial filters have been successfully applied for modeling nonlinear systems, quadratic detectors (Teager's operator, see Chapter 6), echo cancellation, cancellation of nonlinear intersymbol interference, channel equalization in communications, nonlinear prediction, etc. Applications to image processing are in enhancement (image sharpening, edgepreserving smoothing, processing of document images),analysis (edge extraction, texture discrimination),and communications (nonlinear prediction, nonlinear interpolation of image sequences). Overviews of some of these applications are described next.
7.3.1
Contrast Enhancement
In the linear unsharp masking method, a fraction of the highpassfiltered version v ( m ,n) of the input image x ( m ,n ) is used as a correction signal and added to the original image, resulting in the enhanced image y ( m , n ) :
where
This method is very sensitive to noise due to the presence of the highpass filter. Polynomial unsharp maslung techniques, in which a nonlinear filter is substituted for the highpass linear operator in the signal sharpening path, can solve t h s problem. Different polynomial functions can be used. In the Teagerbased operator, details are amplified in bright regions, where the human visual system is less sensitive to luminance changes (Weber's law), and reduced noise sensitivity is achieved in dark areas. The correction signal in this case is [Mitgl]
In the cubic unsharp maslung approach, the sharpening action is performed only if opposite sides of the filtering mask are each deemed to correspond to a different object [Ram96a],thus avoiding noise amplification:
v (m,n )
=
[ x ( m  1,n )  x ( m + 1,n)12 x [ 2 x ( m , n ) x ( m  1,n) x ( m + 1 , n ) ] + [ x ( m ,n  1) x ( m , n + 1)12 x [ 2 x ( m , n ) x ( m , n  1) x ( m , n + 1)).
Figure 7.1 shows the results of the unsharp masking approaches to image contrast enhancement. Figure 7.la is a portion of the original Lena test image; 7.lb was
CHAPTER 7: POLYNOMIAL AND RATIONAL OPERATORS
209
Figure 7.1: (a)Original test image, and contrast enhanced versions obtained using unsharp
masking: (b)linear, (c)Teagerbased,and (d)cubic methods. (Reproduced with permission from [Ram96a]. O 1996 SPIE.) obtained using the linear method, 7.lc the Teagerbased method, and 7.ld the cubic method. Expressions of a slrmlar type can be used for v (m,n).For example, when the data are noisy a more powerful edge sensor is needed and the Sobel operator can be used [Ram96a].
7.3.2
Texture Segmentation
The segmentation of different types of textures present in an image can be performed based on local estimates of second and hlgherorder statistics [Mak94]. In particular, thlrdorder moments are the best features to use in noisy texture discrimination. A pth order statistical moment estimator is a special polynomial operator; for example, for p = 3
=
1

1 x(n)x(n+ i)x(n+ j),
M2 nEx
where n = (nl,n2 ), i = (il, i2) , and j an M x M image block X.
= (jl
,j2 ) . The moments are evaluated within
A class of thirdorder moments exists that are insensitive to white additive noise. Consider for simplicity of notation the lD case. If the available data result , then from an ideal signal corrupted by additive noise, x (n) = s ( n ) + d (n)
To render the estimate independent of the noise, we just need to satisfy the simple constraints i + 0, j # 0, i # j . The selected local moments, after averaging, training, and clustering, permit us to segment composite textures in noise. Examples of the results achievable are presented in Fig. 7.2. The original image in Fig. 7.2a was formed by three different textures from Brodatz's album. Figure 7.2b shows the same image but with added zeromean Gaussian noise; the SNR is 0 dB. Results of segmentation using quadratic filters that estimate secondorder moments are presented in Figs. 7 . 2 ~and 7.2d for the uncorrupted and the noisy image, respectively; it is evident that secondorder moments are well suited for the uncorrupted image but do not work well in the presence of noise. Figures 7.2e and 7.2f show segmentation results obtained from the same images as above but now using cubic filters also. The selected thirdorder moments make the operator able to discern the noisy textures with good precision (Fig. 7.2f), but it is also apparent that the result on the uncorrupted data (Fig. 7.2e) is slightly worse than the one achieved with secondorder moments only. This is due to the higher instability and variance of the output of highorder operators and indicates that the latter should be used only for the noisy case.
7.3.3 Edge Extraction An important step in image analysis is to determine the position of the edges of
the objects in the image. However, edge extraction is much more difficult in the presence of noise. Almost all edge extractor algorithms that have been proposed tend to perform roughly in the same manner when noise is absent but yield very different outputs in the presence of noise. Zero crossings of the second derivative (Laplacian) of the signal are often used to detect edges. The most popular algorithm in this category is the MarrHildreth operator [Mar80], also known as the LaplacianofGaussian(LOG), which executes a lowpass (Gaussian)filtering before evaluating the Laplacian, in order to reduce the sensitivity to noise. The LOG operator (together with other analogous zerocrossing edge extraction methods) is nevertheless affected by two relevant problems: first, one has to exchange noise immunity (a strong lowpass filter is required) with resolution (which is lost in lowpass filtered data), and second, Gaussian filtering causes distortion in the location of edges. A local skewness estimator, which is a particular polynomial operator, can replace the Laplacian. The image first undergoes a mild Gaussian prefiltering; then, an approximately round mask 34 formed by M x M pixels is used to scan the image.
CHAPTER 7: POLYNOMIAL AND RATIONAL
OPERATORS
211
Figure 7.2: Segmentation examples. (a) Original and (b) image with additive Gaussian noise; segmentation of (c) the original image and (d) of the noisy image using quadratic filters; segmentatino of (el the original image and (f) of the noisy image also using cubic filters.
Figure 7.3: Illustration of the behavior of the local skewness when the mask is moved across the edge of an object.
The skewness is estimated in the interior of 34:
where I is a suitable set of indices, and fi (n)is the local average luminance. When a luminance edge is met, the skewness changes its sign, and these zero crossings can be used for edge detection. The change of sign is illustrated in Fig. 7.3. The top row shows an idealized image with a dark object on a bright background, with the analysis mask 34 moving toward the interior of the object. In the bottom row the histogram of the pixels inside the mask at each position is plotted; it is seen that the histogram is symmetric (p3 = 0) when the mask is completely outside or completely inside the object or when it is located exactly halfway but is asymmetric elsewhere. When the mask is mostly on the background, ~3 < 0. However, ~3 > 0 when the mask is mostly on the object. If the curvature of the object border is negligible with respect to the size of N , the center of the mask locates the edge when the sign changes. The same behavior, with opposite sign, results if the object is brighter than the background. With a Gaussian prefiltering, the skewnessofGaussian (SoG) method is obtained, which is intrinsically robust to noise [Ram94].In fact, most common noise distributions (such as Gaussian,uniform, Laplacian, impulsive)are symmetric and as a consequence tend not to affect the skewness estimation. The SoG operation is controlled by (1)the size of the filtering mask, (2) the size of the mask 34 used for the estimation of the skewness, and (3) a threshold needed to reject false zero crossings. The performance of this polynomial technique is best illustrated by means of an example. Figure 7.4a shows the original test image "Boats," while Fig. 7.4b shows its corrupted version obtained by adding a Gaussian noise with variance (r2 = 500. The edges as extracted by the SoG method are shown in Figs. 7 . 4 ~ and 7.4d for the original and the corrupted images respectively. As a comparison
CHAPTER 7: POLYNOMIAL AND RATIONAL OPERATORS
213
Figure 7.4: Examples of edge detection. (a) Original and (b)image with additive Gaussian noise; edges obtained from (c) the original and (d) the noisy image using the SoG operator; edges from (e) the original and (f) the noisy image using the LOGoperator.
the outputs of the LOG operator are also reported (Figs. 7.4e and 7.4f). Threshold values were set subjectively to obtain the best perceptual quality in the various cases. It can be seen that the best quality is obtained by the SoG operator; as a result of its robustness to noise, the amount of Gaussian prefiltering can be small, and this reduces the distortions on small details.
7.4 Applications of Rational Filters We mentioned in Sec. 7.2.3 that adaptation procedures can be used to determine the coefficients of rational filters. However, this approach has been followed in the literature only for simple 1D operators. A possible alternative for the design of suitable rational filters for image processing to achieve a desired result is to employ some heuristic criteria and compose simple polynomials into a more complex function. This section gives an overview of the applications of rational operators in the fields of noise smoothing for image data (with various noise distributions), of image interpolation, and of contrast enhancement.
7.4.1
DetailPreserving Noise Smoothing
A rational filter can be very effective in removing both shorttailed and medium
tailed noise corrupting an image [Ram95, Ram96bI. For the sake of simplicity, we refer to a onedimensional operator, but its extension to two dimensions is straightforward. The filter can be formulated as
where the detailsensing function SO is defined as
and
S ( x ) = k [ x ( n 1 )  x ( n + 1)12
for shorttailed and mediumtailed noise, respectively. In Eq. (7.13),A is a suitably chosen constant. The operator can be expressed in a form sirmlar to that of Eq. (7.8). In the latter case, for example, it becomes y ( n )=
x ( n  1 ) + x ( n + 1 ) + ( A  2 ) x ( n )+ k x ( n ) [ x ( n 1 )  x ( n + 1 ) 1 2 . k [ x ( n 1 )  x ( n + 1)12 + A 9
(7.14)
a constant and a quadratic term are recognizable in the denominator, while the numerator consists of the sum of a linear function and a cubic function of the input signal. We can examine the behavior of these filters for different positive values of the parameter k: If k is very large, y ( n ) 2: x ( n ) and the filter has no effect, while if k 1: 0 and a suitable value is chosen for A, the rational filter becomes a simple linear lowpass filter; for intermediate values of k , the output of the sensor S ( x ) modulates the response of the filter. Hence, the rational filter can act as an edgepreserving smoother conjugating the noise attenuation capability of a linear lowpass filter and the sensitivity to hlgh frequency details of the edge sensor. Practical tests show that the value of k is not critical; moreover, it is not a function of the processed image but rather of the amount of present noise.
CHAPTER7:
POLYNOMIALAND RATIONALOPERATORS
215
This operator can be applied more than once on the input data to obtain a stronger smoothing action. For p passes, in uniform areas where S(x)is negligible the rational filter is equivalent to a linear lowpass filter of size (2p + 1); by defining w = 1/A, the coefficients in Eq. (7.13) are wl = [w, 1  2w, w ] for p = 1, w2 = [w2,2 w ( l  2 w ) , 2 w 2+ (1 ~ w )2 w ~ ( ,l  2 w ) , w 2 ]for p = 2, and so on. More important, in the vicinity of a detail the mask of the equivalent filter takes an asymmetric shape, which tends to cover pixels similar to the reference one; hence, the required edge preservation is obtained. For a 2D operator, it is sufficient to apply the 1Doperator in a 3 x 3 mask in the On,4S0,90°, 135"directions; in this way an operator analogous to anisotropic diffusion [Pergo]results. In fact, the latter operator was introduced as a scalespace method to perform edge detection at different resolutions, smoothing out details that are considered irrelevant at each scale. However, both anisotropic diffusion (see Chapter 8) and the rational filter perform their smoothing action preferably within a homogeneous region rather than across the borders of different regions. This smoothing action can be conceived as a diffusion of the luminance of the image, with a diffusion constant that varies locally and is an inverse function of the gradient; the higher the gradient, that is, the more an image detail is important, the smaller the diffusion. Other types of 2D rational filters can be devised, though, that cannot be represented by the diffusion concept; for example, better results are obtained in general if all the couples of pixels in a 3 x3 mask are included in the processing. To formulate the latter filter, let the pixels in the mask be ordered as follows:
where xo
=
x ( m , n ) . The filter expression then becomes
This rational filter outperforms various other nonlinear techniques in terms of MSE and image quality for different noise distributions ranging from uniform to Gaussian and contaminated Gaussian. As an example, Fig. 7.5a shows a test image corrupted with additive contaminated Gaussian noise [Gab94];after filtering, the image in Fig. 7.5b is obtained. Based on the rational function approach, other operators have been designed and realized for noise smoothing in image sequences [Coc97], for the filtering of speckle noise in SAR images [Ram97a], and for the attenuation of the blocking effects present in images compressed with DCTbased methods [Mar98]. Combining the advantages of the rational filter and orderstatistic operators, medianrational hybrid filters have also been introduced; for the color image filtering problem, a vector rational operation can be performed over the output of three subfilters,
Figure 7.5: (a) Corrupted test image and (b)its filtered version. (Reproduced with permission from [Ram96b].O 1996 IEEE.)
such as two vector median subfilters and one centerweighted vector median filter [Khr99a].
7.4.2
Interpolation with Accurate Edge Reproduction
The rational filter paradigm also can be used to design effective operators for image interpolation. A possible approach relies on the observation that any realworld image may be considered to have been obtained from a higher resolution one after lowpass filtering and decimation, with the antialiasing lowpass filtering being done explicitly or being produced by the image acquisition system. When an ideal edge is present, this filtering operation symmetrically or asymmetrically modifies the value of the adjacent pixels according to the position of the edge itself in the highresolution image; after decimation, consequently, an analysis of the values of the pixels of the lowresolution image gives subpixel information on the position of the edge, which hence may be interpolated with higher precision than that obtained with a linear interpolator. The interpolation is performed by evaluating a nonlinear mean of the pixels adjacent to the pixel y ( m , n ) to be interpolated. We illustrate the method by considering the 1D case for simplicity. Figure 7.6a shows the position of the edge in the highresolution data, Fig. 7.6b shows it in the lowpassfiltereddata, and Fig. 7 . 6 ~shows it in the decimated data. Given the four consecutive pixels X I , x2, x3, and x4, an ideal interpolator should yield for the pixel y (which lies between x2 and x3)a value similar to that of either x2 (left) or x3 (right). As shown in Fig. 7.6d, linear interpolation fails to achieve this result. A rational interpolator can be introduced, which computes the interpolated sample y as y = PX2 + (1p)x3, where
k ( ~ 3 ~
P = k((x1  ~
+(
2 ) '
+
4 ) ' 1 ~ 3~ 4 ) ' ) +
2'
CHAPTER 7: POLYNOMIALAND RATIONAL OPERATORS
(a)
Original signal (high resolution)
(b)
After lowpass filtering
(c)
After decimation
Sample y obtained using linear interpolation
Sample y obtained using rational interpolation
Figure 7.6: Illustration of interpolation in the onedimensional case. Two possible positions are considered for the edge to be reconstructed.
with k being a userdefined parameter that controls the operator. For k = 0 a linear interpolation is obtained, while positive values for k yield the desired edge sensitivity. When the edge is midway between x2 and x3, xl  x2 = x3  x4, so p = 0.5 and y = (x2+ x3)12, and the filter behaves as a linear one. However, when the edge is positioned asymmetrically, the evaluated differences are no longer equal; for example, if the edge is closer to x3 (left column in the figure), then xl  xz < x3  x4, SO p > 0.5 and y 2: x2. In the 2D case, thls method can be applied separately to the rows and the columns of an image. Alternatively, the technique presented by Carrato et al. [Car961 can be used. Figure 7.7 illustrates the performance of the proposed interpolation method. The original 5 12x 5 1 2 image was lowpass filtered with a Gaussian filter, decimated, and interpolated with the rational operator (Fig. 7.7a) and with the cubic convolution technique (Fig. 7.7b) [Key81]. It can be seen that the details (in particular, observe the edges of the hat and the eye) are reconstructed more sharply by the
Figure 7.7: Portion of the interpolated image Lena obtained using (a)the rational operator and (b)cubic convolution.
rational operator. A similar approach has been used to interpolate images by large factors, for example, to reconstruct a JPEGcoded image using only the dc components of its block DCT [Ram97b],yielding relatively sharp images. Following this approach, it is possible to have images of reasonably good quality at low bit rates using standard coding techniques, while of course the ac components may be transmitted later, possibly on demand, if the actual image is eventually needed. Another possible application is the interpolation of color images [Khr98]: any rational filter can be used to process multichannel signals by applying it separately to each component, but in general this is not desirable when correlation exists among components because false color may be introduced. Better results are obtained using vector extensions of the operator, such as by extending it to vector form:
Here, y, XI,x2, x3, and are vector samples (e.g., RGB) and parallel bars indicate the l1 or l2 norm. Finally, it is worth mentioning that rational interpolators have also been used successfully for the restoration of old movies corrupted by scratches and dirt spots [Khr99b]. After localization of the stationary and random defects, a spatial interpolation scheme is used to reconstruct the missing data. The algorithm first checks for the existence of edges so that it can take them into consideration; the edge orientation is estimated and the most convenient data to be used in the reconstruction of the missing pixels are selected. In this way, the edges obtained are free from blockiness and jaggedness.
CHAPTER 7: POLYNOMIALAND RATIONAL OPERATORS
7.4.3
Contrast Enhancement
In Sec. 7.3.1 we introduced a polynomial technique for contrast enhancement in images that was able to avoid noise amplification. In this subsection a rational filtering approach is suggested that partially retains the noise rejection ability of the polynomial method yet has the advantage of avoiding another problem that typically affects images processed with linear unsharp masking methods: excessive overshoot on sharp details [Ram98]. Wide and abrupt luminance transitions in the input image can produce overshoot effects when linear or cubic unsharp masking methods are used; these are further stressed by the human visual system through the Mach band effect. The basic unsharp masking scheme is still used here, but a rational function is introduced in the correctionpath. With such a function, selected details having low and medium sharpness are enhanced but noise amplification is limited, and as a result steep edges, which do not need further emphasis, remain almost unaffected. From a computational viewpoint, it must be stressed that this solution maintains almost the same simplicity as the original linear unsharp masking method. As was done in Sec. 7.3.1, the output value of each pixel is expressed as the sum of the original value and a correction term v (m,n ); here, however, v ( m ,n) is split into two orthogonal components and expressed as a function of a control signal c: y ( m , n ) = x ( m ,n ) + h[vm(m,n ) c m ( m n, ) + vn(m,n ) c n ( m ,n ) ] ,
(7.16)
where vm(m,n ) v,(m, n )
= =
2x(m,n )  x ( m ,n  1)  x ( m ,n + l ) , 2x(m,n )  x ( m  1,n )  x ( m + 1,n ) .
(7.17)
The two control signals are defined as
where k and h are proper positive factors and g is a measure of the local activity of the signal:
Figure 7.8 shows the control term c ( n ) as a function of g ( n ) in the 1D case for a specific choice of k and h (k = 0.001, h = 250). The presence of a resonance peak at the abscissa go = 500 can be noticed, this characteristic enables the operator to emphasize details that are represented by low and mediumamplitude luminance transitions. At the same time, steep and strong edges will yield high values of activity g ( n ) (larger than go)and hence will undergo a more delicate amplification;
Figure 7.8: Plot of the control function c for h permission from [Ram98]. O 1998 SPIE.)
=
250, k
=
0.001. (Reproduced with
in this way, undesired overshoots in the output image will be avoided. At the other end of the diagram, we expect that the noise that is always present in an image will yield small values of g ( n ) (smaller than go),so c ( n ) will be small, too; hence, the noise itself will not affect the correction signal v (n)in homogeneous areas. By selecting the position go of the resonance, the user can achieve the best balance among these effects. Different values of go could yield different amplitudes for the peak co; this would make the overall strength of the unsharp maslung action a function of the selected go, hampering the tuning of the operator. To avoid this effect, c should = 1/ 2 so that the peak height is equal to be normalized using the constraint one for any value of go. Now the values of h and k that are to be used to place the resonance peak in go are
a
In this way, the action of the c ( n ) term is controlled by a single parameter, that is, go. It is interesting to observe that, once go is set, we can freely adjust the intensity of the enhancement using the parameter A; this choice will not affect the width A of the resonance peak. A simulation example is shown in Fig.7.9. Comparing it to the results in Fig. 7.1, it is clearly seen that the noise sensitivity of the rational operator is still better than that of linear unsharp masking (Fig. 7.lb), even if not as good as that of cubic unsharp maslung (Fig. 7.ld); on the other hand, fine details are better amplified, and some excessive overshoots recognizable along the edges of the image processed with the cubic unsharp masking method have disappeared.
CHAPTER7: POLYNOMIALAND RATIONAL OPERATORS
221
Figure 7.9: (a)Original test image and (b)result of the rational unsharp masking algorithm. (Reproduced with permission from [Ram98].O 1998 SPIE.)
7.5
Conclusions and Remaining Issues
Polynomial and rational operators have been introduced in t h s chapter for a wide spectrum of applications in image processing and analysis. Various simulation results have been included. These results demonstrate that even simple operators can yield effective results. However, t h s description clearly indicates that to exploit the possibilities of this family of tools completely a stronger link is needed between theory and practice. It is necessary to formulate design techniques that can overcome the limitations of the heuristic approaches often used as discussed above. To this purpose, mixed competencies are required that should crossfertilize the research in the mathematics and signal processing worlds; t h s is presently the most important challenge in this field.
References [Car961 S. Carrato, G. Ramponi, and S. Marsi. A simple edgesensitive image interpolation filter. In Proc. Third IEEE Intl. Conf: on Image Processing, Vol. 111, pp. 71 1714 (Lausanne, September 1996). [Coc97] F. Cocchia, S. Carrato, and G. Ramponi. Design and realtime irnplementation of a 3D rational filter for edge preserving smoothing. IEEE Trans. Consum. Electron. 43(4), 12911300 (November 1997). [Gab941 M. Gabbouj and L. Tabus. TUT noisy image database. Tech. Report No. 13, Tampere University of Technology, Finland (December 1994). [Key811 R. G. Keys. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. ASSP29(6),11531 160 (December 1981).
[Khr98] L. Khriji, F. A. Cheikh, M. Gabbouj, and G. Ramponi. Color image interpolation using vector rational filters. In Proc. SPIE Conf on Nonlinear Image Processing IX, Vol. 3304, pp. 2629 (1998). [Khr99a] L. Khriji and M. Gabbouj. Vector medianrational hybrid filters for multichannel image processing. IEEESignal Process. Lett. 6(7), 186190 (July 1999). [Khr99b] L. Khriji, M. Gabbouj, G. Ramponi, and E. D. Ferrandiere. Old movie restoration using rational spatial interpolators. In Proc. 6th IEEE Intl. Conf on Electronics, Circuits and Systems (Cyprus, September 1999). [Lee941 J. Lee and V. J. Mathews. A stability theorem for bilinear systems. IEEE Trans. Signal Process. 41(7), 18711873 (July 1994). [Leu941 H. Leung and S. Haykin. Detection and estimation using an adaptive rational function filter. IEEE Trans. Signal Process, 42(12), 33663376 (December 1994). [Mak94] A. Makovec and G. Ramponi. Supervised discrimination of noisy textures using thirdorder moments. In Proc, COST229 WG,1+2 Workshop, pp. 245250 (Ljubljana, Slovenia, April 1994). [Mar801 D. Marr and E. Hildreth. Theory of edge detection. Proc. R. Soc. London B 207,187217 (1980). [Mar981 S. Marsi, R. Castagno, and G. Ramponi. A simple algorithm for the reduction of blocking artifacts in images and its implementation. IEEE Trans. Consum. Electron. 44(3), 10621070 (August 1998). [Mat911 V. J. Mathews. Adaptive polynomial filters. IEEE Sig. Process. Mag., pp. 1026 (July 1991). [MatOO] V. J. Mathews and G. L. Sicuranza.Polynomial Signal Processing, Wiley, New York (2000). [Mitgl] S. K. Mitra, H. Li, I.S. Lin, and TH. Yu. A new class of nonlinear filters for image enhancement. In Proc. Intl. Conf on Acoustics, Speech, and Signal Processing, pp. 2 52 52 528 (Toronto,April 1991). [Per901 P. Perona and J. Malik. Scale space and edge detection using anisotropic diffusion. IEEE Trans. Putt. Anal, Machine Intell. 12(7),629639 (July 1990). [Ram901 G. Ramponi. Biimpulse response design of isotropic quadratic filters. Proc. IEEE 78(4),665677 (April 1990). [Ram941 G. Ramponi and S. Carrato. Performance of the SkewnessofGaussian (SoG) edge extractor. In Proc. Seventh European Signal Processing Conf:Vol. I, pp. 4 5445 7 (Edinburgh, Scotland, September 1994).
CHAPTER7: POLYNOMIALAND
RATIONAL OPERATORS
223
[Ram951 G. Ramponi. Detailpreserving filter for noisy images. Electron. Lett. 31(11),865866 (25 May 1995). [Ram96a] G. Ramponi, N. Strobel, S. K. Mitra, and TH. Yu. Nonlinear unsharp masking methods for image contrast enhancement. J. Electron. Imaging. 5(3), 353366 (July 1996). [Ram96b] G. Ramponi. The rational filter for image smoothing. IEEE Signal Process. Lett. 3(3),6365 (March 1996). [Ram97a] G. Ramponi and C. Moloney. Smoothing speckled images using an adaptive rational operator. IEEE Signal Process. Lett. 4(3), 6871 (March 1997). [Ram97b] G. Ramponi and S. Carrato. Interpolation of the DC component of coded images using a rational filter. In Proc. Fourth IEEE Intl. Conf:on Image Processing, Vol I, pp. 389392 (Santa Barbara, CA, October 1997). [Ram981 G. Ramponi and A. Polesel. A rational unsharp masking technique. J. Electron. Imaging 7(2),333338 (April 1998). [SchsO] M. Schetzen. The Volterra and Wiener Theories of Nonlinear Systems. Wiley, New York (1980). [Sch81] M. Schetzen.Nonlinear system modeling based on the Wiener theory. Proc, IEEE 69(12), 15571 573 (December 1981). [Sic921 G. L. Sicuranza. Quadratic filters for signal processing. Proc. IEEE 80(8), 12631285 (August 1992). [Vo187] V. Volterra. Sopra le funzioni che dipendono da altre funzioni. Rendiconti Regia Accademia dei Lincei (1887). [Wie58] N. Wiener. Nonlinear Problems in Random Theory. Technology Press, M.I.T., and Wiley, New York (1958).
This Page Intentionally Left Blank
Nonlinear Partial Differential Equations in Image Processing
Department of Electrical and Computer Engineering University of Minnesota Minneapolis, Minnesota
8.1
Introduction
The use of partial differential equations and curvature driven flows in image analysis has become a research topic of increasing interest in the past few years. Let uo : R~ R represent a graylevel image, where uo(x, y ) is the graylevel value and R is the set of all real numbers. Introducing an artificial time t, the image deforms via a partial differential evolution equation (PDE) according to



where u ( x , y, t ) : R~ x [O, T ) R is the evolving image, .T : R R is an operator that characterizes the given algorithm, and the image uo is the initial c0ndition.l The solution u ( x ,y, t ) of the differential equation gives the processed image at "scale" t. In the case of vectorvalued images, a system of coupled PDEs of the form of Eq. (8.1) is obtained. typically depends on the image and the first and second spatial derivatives.
The same formalism can be applied to planar curves (boundaries of planar shapes), where u is a function from R to R2,or to surfaces, functions from R2 to R3. In this case, the operator 7 must be restricted to the curve, and all isotropic motions can be described as a deformation of the curve or surface in its normal direction, with velocity related to its principal curvature(s). In more formal terms, a flow of the form is obtained, where K i are the principal curvatures and & is the normal to the curve or surface u . A tangential velocity can be added as well, which may help the analysis but does not affect the geometry of the flow. Partial differential equations can be obtained from variational problems. Assume a variational approach to an image processing problem formulated as
where U is a given energy. Let y ( u ) denote the corresponding EulerLagrange derivative (firstvariation). Since under general assumptions, a necessary condition for u to be a minimizer of U is that ( u ) = 0, the (local)minima may be computed via the steady state solution of the equation
where t is again an "artificial" time marching parameter. PDEs obtained in this way have been used for quite some time in computer vision and image processing, and the literature is large. The classical example is the Dirichlet integral,
(Vu stands for the gradient of u), which is associated with the linear heat equation (A stands for the Laplacian): au (t,X ) = Au(x) at This equation gives birth to the whole scalespace theory, which addresses the simultaneous representation of images at multiple scales and levels of detail (see also Chapter 10 in this book). More recently, extensive research is being done on the direct derivation of nonlinear evolution equations, which, in addition, are not necessarily obtained from the energy approaches. This is in fact the case for a number of curvature equations of the form of Eq. (8.2). Use of partial differential equations and the curvesurface flows in image analysis leads to model images in a continuous domain. This simplifies the formalism, which becomes grid independent and isotropic. The understanding of discrete local nonlinear filters is facilitated when one lets the grid mesh tend to zero and, thanks to an asymptotic expansion, rewrites the discrete filter as a partial differential operator. 
Conversely, when the image is represented as a continuous signal, PDEs can be seen as the iteration of local filters with an infinitesimal neighborhood. This interpretation of PDEs allows one to unify and classify a number of the known iterated filters as well as to derive new ones. Actually, Alvarez et al. [Ah931classified all of the PDEs that satisfy several stability requirements for imaging processing, such as locality and causality. (As pioneered elsewhere [Sap97b],future research might give up the locality requirement.) An advantage of the PDEs approach is the possibility of achieving h g h accuracy and stability, with the help of the extensive research available on numerical analysis (this is also addressed in Chapter 10 for the specific case of the implementation of morphological operations). Of course, when considering PDEs for image processing and numerical implementations, we are dealing with derivatives of nonsmooth signals, and the right framework must be defined. The theory of viscosity solutions [Crag21 provides a framework for rigorously employing a partial differential formalism, in spite of the fact that the image may not be smooth enough to give a classical sense to the first and second derivatives involved in the PDE. Last but not the least, t h s area has quite a unique level of formal analysis, creating the possibility of providing not only successful algorithms but also useful theoretical results, like existence and uniqueness of solutions. The use of nonlinear PDEs in image processing has become one of the fundamental geometric approaches in this area. Its connections with one of the classical geometric approaches in image processing, mathematical morphology, is fully discussed in in Chapter 10. Additional examples of the use of morphological operators, in their discrete form, are given in Chapter 9. Two examples of PDEbased image processing algorithms are discussed in this chapter; both are related to the computation of geodesics, although in different spaces and for different applications. The first part of the chapter deals with the use of PDEs for image segmentation. This represents one of the most successful and well established results in the area. The second part discusses the use of PDEs to process multivalued data defined on nonflat manifolds, for example, directional data. This can be considered as the second generation of this PDEbased image processing approach since most of the work reported in the literature so far is for "flat" data. Both topics are representative of the work in this area: they are based on fundamental mathematical concepts while giving practical results that were not possible or were extremely difficult to obtain with other techniques. That is, both topics show a significant interaction between theory and practice, something that has characterized the PDEs framework since its origins. For additional material on PDEs methods, bibliographes, and some history on the subject, see, for example, Caselles et al. [Cas98],Guichard and Morel [Gui95], and Romeny [Rom94], as well as Chapter 10 in this book.
8.2 Segmentation of Scalar and Multivalued Images One of the basic problems in image analysis is segmentation. A class of this is object detection, where certain objects in the image are to be singled out. In this case, the image is basically divided into two sets: objects and background. On the other hand, in general image segmentation, the image is partitioned into an unknown number of "uniform" areas. Both problems have been studied since the early days of computer vision and image processing, and different approaches have been proposed (see, for example, [Har85, Mor94, Mum89, Zuc761 and references therein). We first concentrate on object detection and then show how the proposed scheme can be generalized to obtain complete image segmentation. "Snakes," or active contours, were proposed by Kass et al. [Kas88] to solve the object detection problem. The classical snakes approach is based on deforming an initial contour or surface toward the boundary of the object to be detected. The deformation is obtained by minimizing a global energy designed such that its (local) minima are obtained at the boundary of the object. The energy is basically composed of a term that controls the smoothness of the deforming curve and another that attracts it to the boundary. Geometric models of deformable contours and surfaces proposed in Caselles et al. [Cas93]and Malladi et al. [Ma1951are based on the theory of curve evolution and geometric flows, which, as pointed out, has gained a large amount of attention from the image analysis community in the past years. These models allow automatic changes in topology when implemented using levelsets [Osh88]. Caselles et al. showed the mathematical relation between these two approaches for two dimensional object detection, proposing the geodesic active contours model, [Cas97a],and extended the work to three dimensions based on the theory of minimal surfaces [Cas97b] (see also [Kis95, Sha95, Whi951, for related approaches, especially [Kis95]). For a particular case, the classical energy snakes approach is equivalent to finding a geodesic curve in a Riemannian space with a metric derived from the image. Assuming a levelsets representation of the deforming contour, one can compute this geodesic curve via a geometric flow that is very similar to the one obtained in the curve evolution approaches mentioned above. This flow, however, includes a new term that improves upon those models (for properties and advantages of the model, see [Cas97a, Sap97al; for additional applications and significant extensions of the geodesic framework, see [Fau98, Lor99, Par971). Here we first describe the basic components of this geodesic model for object detection and segmentation in scalar and vectorvalued images. Vectorvalued images are obtained through imaging modalities in which the data are recorded in a vector fashion, as in color, medical, and LANDSAT applications. In addition, the vectorvalued data can be obtained from scale and orientation decompositions often used in texture analysis (see [Sap97a] for the corresponding references). An additional distance based PDE approach for image segmentation, based on morphological watersheds, is formulated in Chapter 10.
8.2.1
Geodesic Active Contours
We now briefly review the active contours [Cas97a],to show their mathematical relation to previous works [Cas93,Kas88, Ma195, Ter881. Let C(p) : [O, 11 R~ be a parametrized planar curve and u : [O, a ] x [0, b] R+ be a given image in which we want to detect the objects boundaries. Let g,,,, ( x ) be a decreasing function such that g,,,, 0 when x co. This is the edge detection function. Terzopoulos et al. have shown, in one of the most classical and fundamental works in computer vision and image processing [Ter88], that objects can be detected (or segmented out) in an image by letting C(p) deform in such a way that it is smoothly attracted to regions with low values of g,,,, (. ) , that is, to the object's edges. The deformation of C is governed by the minimization of a given energy that combines regularization terms with edge attraction terms. Based on classical dynarnical systems principles, it can be shown that minimizing (a version of) the classical energy is basically equivalent to minimizing




length
LRPJ~
ggray(llV~(C)Il)d~,
where dv is the Euclidean arclength. Therefore, solving the active contours problem is equivalent to finding a path of minimal distance, where distance is given by the modified arclength g,,,, dv (see also Sec. 10.6 for other applications of weighted distances). In other words, the object's boundary is given by a closed geodesic curve. To find this geodesic curve, we first use the steepest descent method, which gives a local minima of L R . Then the flow minimizing LR is given by [Cas97a]
*
where K is the Euclidean curvature and the unit n ~ r m a l .The ~ first component of the right hand side of this equation is regularizing the curve (the classical geometric heat flow),while the second one attracts the curve to the object's boundaries (see also Sec. 10.4 for other curve evolution formulations). Note that both curvaturebased and edge attracting velocity terms are present in this geodesic formulation, not just edge stopping terms such as those in watershedtype algorithms; see Chapter 10. These terms are fundamental to segment noisy images and detect objects with height variations in their gradients. To complete the model, we now introduce the levelset formulation [Osh88] (see also Sec. 10.4). Let us assume that the curve C is parametrized as a levelset of a function w : [O, a ] x [O, b] R. That is, C is such that it coincides with the set of points in w such that w = constant (usually zero). Then, the levelset

2 ~ o tthat e in contrast with the formulations in Sec. 10.6, this PDE is not an Eikonal equation. On the other hand, starting from a point on the boundary, an open path of minimal weight LR (open geodesic) from that point can be computed with a type of Eikonal equation, as formulated by Cohen and Kimmel [Coh97].
formulation of the steepest descent method says that solving the above geodesic problem starting from C amounts to searching for the steady state of the evolution equation
with initial datum w(0,x) = wo(x) (K is the Euclidean curvature at the levelsets of w). The minima of LR are then obtained with the (zero) levelset curve of w (m). TOincrease the speed of convergence, we can add a balloon type force [Cohgl], as has been done elsewhere [Cas93, Ma1951, and just consider the term vg,,, (u)ll Vw 11, v E R+,as an extra speed (which minimizes the enclosed area [Cohgl])in the geodesic problem, obtaining
Equation (8.3), which is the solution of the geodesic problem (LR) with an extra areabased speed, constitutes the geodesic active contours and it is an improvement over previous equations for them [Cas93, Ma1951 (see [Cas97a, Sap97al). It can be shown that this PDE has a unique solution (in the viscosity framework) and that under simple conditions, the curve converges to the object's boundaries. That is, in addition to providing stateoftheartpractical results (givenbelow), the formal analysis of this equation is possible. This is quite a unique characteristic of the PDE approach to image processing problems. This equation, as well as its 3D extension [Cas97b], was independently proposed by Kichenassamy et al. [Kis95]and also by Shah [Shag51based on a slightly different initial approach (see also [Whigs]). Extensions of the Caselles [Cas93] and Malladi [Ma1951model also were studied by Tek and Kimia [Tek95],motivated by the work of Kimia et al. [Kim95]. The differences between those models have been explained elsewhere [Cas97a, Sap97al. Figure 8.1 presents an example of this flow. On the left we initialize the curve surrounding all of the objects in the image. Note how the curve splits, and all of the objects are detected at once, on the right side of Fig. 8.1. This is automatically handled by the algorithm,without adding any extra topology tracking procedures.
8.2.2
Geodesic Color Active Contours
We want to extend the geodesic framework for image segmentation presented above to vectorvalued data. In general, two different approaches can be adopted to work on vectorvalued images. The first approach is to process each plane separately and then to integrate the results of this operation somehow to obtain one unique segmentation for the whole image. The second approach is to integrate the vector information from the very beginning and deform a unique curve based on this information, directly obtaining a unique object segmentation. We adopt the second approach; that is, we integrate the original image information to find
Figure 8.1: Example of geodesic snakes. The original curve surrounds all of the objects
(left)and evolves, splitting in the process, to detect all of the objects at once (right).
a unique segmentation directly. The main idea is to define a new Riemannian (metric) space based on information obtained from all of the components in the image at once. More explicitly, edges are computed based on classical results on Riemannian geometry [Kre59]. When the image components are correlated, as in color images, this approach is less sensitive to noise than the combination of scalar gradients obtained from each component [Leegl]. These vector edges are used to define a new metric space in which the geodesic curve is to be computed. The object boundaries are then given by a minimal "color weighted" path. Vector Edges
We present now the definition of edges in vectorvalued images, based on classical Riemannian geometry results [Kre59]. The Riemannian geometry framework for edge detection in multivalued images described below was first suggested by Di Zenzo [DiZ861 and was extended later [Cum91,Lee91, Sap96bl. This approach has a solid theoretical background and constitutes a consistent extension of singlevalued gradient computations. Let u(x1, x2) : R~ Rn be a multivalued image with components ui(xl,x2): R2 R, i = 1 , 2 , ..., m. The value of the image at a given point (x:, x!) is a vector in Rn, and the difference of image values at two points P = (xy,x!) and Q = (x:, xi) is given by Au = u ( P )  u(Q). When the (Euclidean) distance d(P, Q) between P and Q tends to zero, the difference becomes the arc (81.11axi dxi. The quadratic form du2 is called the first funelement du = damental form [Kre59]. Although we present now only the Euclidean case, the theory we develop holds for any nonsingular Riemannian metric. For different metrics, either a space transform can be applied to a Euclidean space, if possible, or the metric induced by the given space can be used directly (if it is nonsingular). Using the standard notation of Riemannian geometry [Kre59], we have that 0ij A (aulaxi)  (aulaxj), and du2 = [dxl, d ~ 2 ] [ g i j ] [ d x ldx2IT. , For a unit vector ir = (COS 0, sin O), du2(ir) indicates the rate of change of the image in the ir direction. It is well known that the extrema of the quadratic form are obtained in the directions of the eigenvectors 8, and 8 of the metric tensor [gij], and

+
the values attained there are the correspondmg eigenvalues A+ and A_. We call 8, the direction of maximal change and A+ the maximal rate of change. Similarly, 8 and A are the direction of minimal change and the minimal rate of change, respectively. In contrast with scalar images ( n = l),the minimal rate of change A may be different from zero. In the singlevaluedcase, the gradient is perpendicular to the levelsets, and A = 0. The "strength" of an edge in the multivalued case is not given simply by the rate of maximal change A+ but by how A+ compares to A_. Therefore, a first approximation of edges for vectorvalued images, analogous to selecting a function of I(Vu II in the n = 1case, should be a function f = f (A+, A  ) . Selecting f = f (A+  A) is one choice since for n = 1 it reduces to the gradientbased edge detector. This definition of vector edges was used elsewhere [Sap96b] for color image diffusion and is used here to define vector snakes. It can also be used to define vector total variation [Sap97a] (see also [Blo98]). Vector Snakes
Let fcOlor(u) & f (A+, A) be the edge detector as defined above. The edge stopping max (or oo), function gc,lor(u) is then defined such that gcolor(u) 0 when f as in the scalar case. For example, we can choose fcolor(u)& (A,  A  ) l / p or fcolor (4 4 &, P > 0, and gcoior (u) & 1/ (1 + f ) or gcoior(u) A expi f I. The function (metric)gcolor (u) defines the Riemannian space on whch we compute the geodesic curve. Defining Lc01,, 4 5,length gcolor(u) dv, the object detection problem in vectorvalued images is then associated with minimizing Lc01,,. We therefore have formulated the problem of object segmentation in vectorvalued images as a problem of finding a geodesic curve in a Riemannian space defined by a metric induced from the whole vector image. To minimize LcOlor,that is, the color length, we compute as before the gradient flow. The equations developed for the geodesic active contours are independent of the specific selection of the metric g. Therefore, the same equations hold here. by gcolor (u) and embedding the evolving curve C in the function Replacing g,,,, (u) w : R~  R we obtain the general flow, with additional unit speed, for the color snakes: aw (8.4) at = gcolor(u)(V + K ) IIVW II + VW . Vgcolor(u).


Recapping, Eq. (8.4) is the modified levelsets flow corresponding to the gradient descent of Lc01,,. Its solution (steady state) is a geodesic curve in the Riemannian space defined by the metric gcolor (A+, A ) of the vectorvalued image. This solution gives the boundary of objects in the scene. Note that A+ and A can be computed on a smooth image obtained from the vectorvalued anisotropic diffusion [Sap96b]. Following Caselles et al. [Cas97a],theoretical results regarding the color active contours can be obtained [Sap97a]. Figure 8.2 shows an example of our vector snakes model. The numerical implementation is based on the algorithm for surface evolution via levelsets developed
Figure 8.2: Example of the vector snakes for a texture image.
by Osher and Sethian [OshSS]. The original image is filtered with Gabor filters tuned to different frequencies and orientations as proposed by Lee et al. [Lee921 for texture segmentation (see [Sap97a] for additional related references). From this set of frequencyorientation decomposed images, g,,l,, is computed accordingly and the vectorvalued snakes flow is applied. Four frequencies and four orientations are used, resulting in 16 images. We should mention that a number of results on color or vectorvalued segmentation have been reported in the literature (e.g., [Zhu95];see [Sap97a]for relevant references and further comparisons). Here we address the geodesic active contours approach with vectorimage metrics, a simple and general approach. Other algorithms can be extended to work on vectorvalued images as well, following the framework described in t h s chapter.
8.2.3
Selfsnakes
We now briefly extend the formulation of the geodesic snakes for object detection presented above to a new flow for segmentationsimplification of images. This flow is obtained by deforming each of the image levelsets according to the geodesic snakes. The resulting flow, denoted as selfsnakes,is closely related to a number of previously reported image processing algorithms based on PDEs, such as anisotropic diffusion [Alv92, Per901 and shockfilters [Osh90], as well as the MumfordShah variational approach for image segmentation [Mum89]. The explicit relations are presented in detail elsewhere [Sap96a, Sap97al. Let us observe again the levelsets flow corresponding to the singlevalued ( V W I I1OWII 1). geodesic snakes as presented above: aw l a t = II Vw lldiv(ggray(~) Two functions (maps from R~ to R) are involved in t h s flow, the image u and the u, that is, that the auxiliary auxiliary levelsets one, w. Assume now that w
Figure 8.3: Smoothing a portrait of Gauss with the geometric selfsnakes (original on the left and enhanced on the right).
levelsets function is the image itself. This equation then becomes
A number of interpretations can be given to this equation. First of all, based on the analysis of the geodesic active contours, the flow in Eq. (8.5)indicates that each levelset of the image u moves according to the geodesic active contours flow, g , to boundaries. This gives the name being smoothly attracted by the term V selfsnakes to the flow. This interpretation also explains why image segmentation is obtained. Furthermore, Eq. (8.5) can be rewritten as the composition of an anisotropic diffusion term with a shock filter one. This relates the selfsnakes, and the topic of active contours in general, with previously developed PDE based algorithms [Alv92, Cat92, Nit92, Osh90, Per90, Rud92, You961 (see [Sap96a, Sap97al). An example is presented in Fig. 8.3. The same approach can be followed for vectorvalued images.
8.2.4
Discussion
We have presented a framework for geometric segmentation of scalar and vectorvalued images and have shown that the solution to the deformable contours approach for object detection is given by a geodesic curve in a Riemannian space defined by a metric derived from the data. In the case of vectorvalued images, the metric itself is given by a definition of edges based on classical Riemannian geometry (this definition also leads to a novel concept of levelsetsin vectorvalued images [ChuOO]). The geodesic framework for object detection was then extended to obtain combined anisotropic diffusion and shock filtering in scalar and vector images. This flow simplifies the image and is obtained by deforming each of the image levelsets according to the geodesic flow.
The geodesic active contours provide stateoftheart object detection and image segmentation results. In addition, they have been extended by Paragios and Deriche for object tracking in video [Par971 and by Faugeras and Keriven for 3D shape reconstruction from stereo [Fau98]. All of the work described so far deals with images defined on the plane, taking values in the general Euclidean space Rn. Tlvs is also true for the vast majority of the published work in PDEs based image processing. But what happens if we define the images on more general manifolds or if the images take values not on Rn but also on general manifolds? We deal with these questions in the rest of this chapter.
8.3
Nonlinear PDEs in General Manifolds: Harmonic Maps and Direction Diffusion
In a number of disciplines, directions provide a fundamental source of information. Examples in the area of computer vision are (ZD, 3D, and 4D) gradient directions, optical flow directions, surface normals, principal directions, and color. In the color example, the direction is given by the normalized vector in the color space. Frequently, the available data are corrupted with noise, and thus there is a need for noise removal. In addition, it is often desired to obtain a multiscaletype representation of the direction, similar to those obtained for graylevel images [Koe84,Per90, Per98, Wit831. Addressing these issues in particular, and diffusion in arbitrary manifolds in general, is the goal of this section. We will see that this is related to geodesic computations as well. An Rn direction defined on an image in R2 is given by a vector u ( x , y, 0) : R2 Rn such that the Euclidean norm of u ( x , y, 0) is equal to one; that is, U: (x, y, o ) ]'I2 = 1, where ui(x,y , 0) : R2 R are the components of the vector. The notation can be simplified by considering u ( x ,y , 0) : R2 Snl, where Snl is the unit ball in Rn. T h s implicitly includes the unit norm constraint. (Any nonzero vector can be transformed into a direction by normalizing it.) When smoothing the data, or computing a multiscale representation U ( X , y , t ) of a direction u ( x , y, 0) (t stands for the scale), it is crucial to maintain the unit norm constraint, which is an intrinsic characteristic of directional data.3 That is, the smoothed direction u ( x , y , 0) : R2 Rn must also satisfy [ E L 1 12;(x,y, o ) ] = 1, or u ( x , y, 0) : R2 Snl. The same constraint holds for a multiscale representation u ( x ,y, t ) of the original direction u ( x , y, 0). This is what makes the smoothing of directions different from the smoothing of ordinary vectorial data as performed elsewhere [Sap96b, Whi941: The smoothing is performed in Snl instead of Rn.





3 ~ this n work we do not explicitly address the problem in which the direction smoothing depends on other image attributes (see, for example, [Lin94]);the analysis is done intrinsically to the directional, unit norm data. When other attributes are present, we process them separately or via coupled PDEs (e.g., [TanOOb]). If needed, the unit norm constraint can be relaxed using a framework similar to the one proposed here.
Directions can also be represented by the angle(s) the vector makes with a given coordinate system, denoted in this chapter as orientation(s).In the 2D case, for example, the direction (ul,u2) of a vector can be given by the angle 8 that this vector makes with the x axis (we consider 8 E [O,27~)):8 = arctan(uz/ul). There is, of course, a onetoonemap between a direction vector u(x, y ) : R2 S 1 and the angle 0. Using this relation, Perona [Per981 transformed the problem of 2D direction diffusion into a 1D problem of angle or orientation diffusion (see also the comment in Sec. 8.3.2). Perona then proposed PDE based techniques for the isotropic smoothing of 2D orientations (see also [Gra95, Wei961 and the general discussion of these methods in [Per98]).'I Smoothing orientations instead of directions solves the unit norm constraint but adds a periodicity constraint. Perona showed that a simple heat flow (Laplacian or Gaussian filtering) applied to the 8 ( x ,y ) image, together with special numerical attention, can address this periodicity issue. This approach applies only to small changes in 8, that is, smooth data, thereby disqualifying edges. The straightforward extension of this to S n  l would be to consider n  1 angles and smooth each one of these as a scalar image. The natural coupling is then missing, resulting in a set of decoupled PDEs. In this work we follow the suggestion in Caselles et al. [CasOO] and directly perform diffusion on the direction space, extending to images representing directions the classical results on diffusion of grayvalued images [Alv92,Koe84, Per90, Rud92, Sap96b, Wei96, Whi951. That is, from the original unit norm vectorial image u ( x ,y, 0) : R~ S n  l we construct a family of unit norm vectorial images u(x, y, t ) : R2 x [0, T ) S n  l that provides a multiscale representation of directions. The method intrinsically takes care of the normalization constraint, elirninating the need to consider orientations and develop special periodicity preserving numerical approximations. Discontinuities in the directions are also allowed by the algorithm. The approach follows results from the literature on harmonic maps in liquid crystals, and u(x, y, t ) is obtained from a system of coupled partial differential equations that reduces a given (harmonic)energy. Energies giving both isotropic and anisotropic flows are described. Before presenting the details of the framework for direction diffusion proposed here, let us repeat its main unique characteristics: (1)It includes both isotropic and anisotropic diffusion. (2)It works for directions in any dimension and for general data on nonflat manifolds. (3) It supports nonsmooth data. (4) It is based on a substantial amount of existing theoretical results that help to answer a number of relevant computer vision questions.



4 ~Perona s pointed out in his work, this is just one example of the diffusion of images representing data beyond flat manifolds. His work has been extended using intrinsic metrics on the manifold [Cha99, Soc981. Chan and Shen [Cha991explicitly deal with orientations and present the L1 norm as well as many additional new features, contributions on discrete formulations, and connections with our approach. The work by Sochen et al. [Soc98]does not deal with orientations or directions. Rudin and Osher [Rud94]also mention the minimization of the L1 norm of the divergence of the normalized image gradient (curvature of the levelsets);this is done in the framework of image denoising,without addressing the regularization and analysis of directional data or presenting examples. These works do not use the classical and "natural" harmonic maps framework.
CHAPTER 8: NONLINEAR PARTIAL DIFFERENTIALEQUATIONS
8.3.1 The General Problem

Let u ( x , y, 0) : RZ Snl be the original image of directions. That is, this is a collection of vectors from R2 to Rn such that their unit norm is equal to one; that is, Ilu(x,y, 0) II = 1,where I1 . I1 indicates Euclidean length. The u ~ ( xy, , 0) : R2 R stands for each of the n components of u ( x ,y , 0). We search for a family of images, a multiscale representation, of the form u ( x ,y , t ) : R2 x [0, T ) Snl, and once again we use ui (x,y , t ) : R2 R to represent each of the components of this family. Let us define the component gradient V U ~as Vui A (aui/ax)x' + ( a u i / a y ) P , where x' and 9 are the unit vectors in the x and y directions, re) ~ the l ~absolute ~ ~ spectively. From this, llVuill = [ ( a u i l a ~+) ~( a ~ ~ / a ygives value of the component gradient. The component Laplacian is given by Aui = (a2ui/ax2) + ( a 2 u i / a y 2 )W . e are also interested in the absolute value of the image gradient, given by IlVull A [ C Y = , ( a u i l a ~+) ~( i 3 ~ ~ l a y ) ~ ] ~ / ~ . Having this notation, we are now ready to formulate our framework. The problem of harmonic maps in liquid crystals is formulated as the search for the solution +


where SZ stands for the image domain and p 2 1. This variational formulation can be rewritten as mi&:R2+Rn [Jn I1 Vullp d x d y , with llull = 1. This is a particular case of the search for maps u between Riemannian manifolds (M, g ) and (N, h), which are critical points of the harmonic energy:
where II VMull is the length of the differential in M. In our particular case, M is a domain in R2 and N = Snl , and I1 VMUI1 reduces to the absolute value of the image gradient. The critical points of Eq. (8.7) are called pharmonic maps (or simply harmonic maps for p = 2). This is in analogy to the critical points of the Dirichlet energy [, I1 vu112for realvalued functions u , which are called harmonic functions. Note also that the critical points are (generalized) geodesics [Ee178, Ee1881. The general form of the harmonic energy with p = 2 has been used successfully, for example, in computer graphics to find smooth maps between two given (triangulated) surfaces (normally a surface and the complex plane) [Eck95, Hak99, Zha991. In this case, the search is indeed for the critical point, that is, for the harmonic map between the surfaces. This can be done, for example, via finite elements [Hak99]. In our case, the problem is different. We already have a candidate map, the original image of directions u ( x ,y, O ) , and we want to compute a multiscale or regularized version of it. That is, we are not interested in only the harmonic map between the domain in R~ and Snl (the critical point of the energy) but also the process of computing this map via partial differential equations. More specifically, we are interested in the gradientdescenttype flow of the harmonic energy of Eq. (8.7). This is partially motivated by the fact that diffusion equations
for grayvalued images can be obtained as gradient descent flows acting on realvalued data (see, for example, [Bla98,Per90, Rud92, You961). Isotropic diffusion is just the gradient descent of the L2 norm of the image gradient, while anisotropic diffusion can be interpreted as the gradient descent flow of more robust norms acting on the image gradient. For the most popular case of p = 2, the EulerLagrangeequation corresponding to Eq. (8.7)is a simple formula based on AM, the LaplaceBeltrarni operator of M, and AN(u),the second fundamental form of N (assumed to be embedded in Rk) evaluated at u (e.g., [Ee178, Str851): AMU + AN(u)(VMu,VMU) = 0. This leads to a gradient descent type of flow, that is,
In the following sections, we present the gradient descent flows for our particular energy of Eq. (8.6), that is, for M being a domain in R2 and N equal to Sn1.5 Below we concentrate on the cases of p = 2 (isotropic) and p = 1 (anisotropic). The use of p = 2 corresponds to the classical heat flow from linear scalespace theory [Koe84,Wit831, while the case p = 1 corresponds to the total variation flow studied by Rudin et al. [Rud92].
8.3.2
Isotropic Diffusion
It is easy to show that for p = 2, the gradient descent flow corresponding to Eq. (8.6) is given by the set of coupled P D E S ~
This system of coupled PDEs defines the isotropic multiscale representation of U ( X , y, O), which is used as the initial data to solve Eq. (8.9). (Boundary conditions are also added in the case of finite domains.) The first part of Eq. (8.9)comes from the variational form, while the second comes from the constraint. As expected, the first part is decoupled between components uiand is linear, while the coupling and nonlinearity come from the constraint. For p = 2, we have the following important results from the literature on harmonic maps:
Existence. Existence results for harmonic mappings have already been reported [Eel641for a particular selection of the target manifold N. Struwe [Str85]showed,
or
data such as surface normals and principal directions, M is a surface in 3D and the general flow given by Eq. (8.8)is used. This flow can be implemented using numerical techniques to compute V M and AM on triangulated or implicit surfaces (using [Hak99] and results developed by S. Osher and colleagues at UCLA). =if n = 2, that is, if we have 2D directions, then it is easy to show that for u ( x ,y ) = (cosO(x,y),sinO(x,y)),the energy in Eq. (8.6) becomes jJa(8$ + 8$)p12d x d y . For p = 2 we then obtain the linear heat flow on 0 (8t = AO) as the corresponding gradient descent flow, as expected from Perona's results in [Per98]. This, of course, is directly derived from the theory of harmonic maps. When the data are not regular, the direction and orientation formulations are not necessarily equivalent.
in one of the classical papers in the area, that for initial data with finite energy [as measured by Eq. (8.7)1,M as a two dimensional manifold with aM = 0 (manifold without boundary), and N = Snl ,there is a unique solution to the general gradient descent flow. Moreover, t h s solution is regular, with the exception of a finite number of isolated points, and the harmonic energy is decreasing in time. If the initial energy is small, the solution is completely regular and converges to a constant value. (The results actually holds for any compact N.) T h s uniqueness result was later extended to manifolds with smooth aM + 0 and for weak solutions [Fre95]. Recapping, there is a unique weak solution to Eq. (8.9) [weak solutions (M,N)],and the set of possible singularities is finite. defined in natural spaces, These solutions decrease the harmonic energy. The result is not completely true for M with dimension greater than 2, and this was investigated, for example, by Chen [Che89]. Global weak solutions exist, for example, for N = Snl, although there is no uniqueness for the general initial value problem [Cor9O]. Results on the regularity of the solution, for a restricted suitable class of weak solutions, to the harmonic flow for high dimensional manifolds M into Snl have been recently reported [Che95, Fe1941. In this case, it is assumed that the weak solutions hold a number of given energy constraints. Singularities in 20. If N = s1and the initial and boundary conditions are well behaved (smooth, finite energy), then the solution of the harmonic flow is regular. This is the case, for example, for smooth 2 0 image gradients and 2 0 optical flow. Singularities in 30. Unfortunately, for n = 3 in Eq. (8.9) (that is N = s2,3 0 vectors), smooth initial data can lead to singularities in finite time [Cha92]. Chang et al. showed examples in which the flow of Eq. (8.9),with initial data u ( x ,y , 0) = u ( x ,y ) E C1( D ~s2) , (with D~ being the unit disk on the plane) and boundary develops singularities in finite time. The idea is conditions u ( x ,y, t )lap= uJaD2, to use as original data u a function that covers S2 more than once. From the point of view of the harmonic energy, the solution is "giving up on" regularity to reduce energy. Singularities topology. Since singularities can occur, it is then interesting to study them [Bre86, Har97, Qin951. For example, Brezis et al. studied the value of the harmonic energy when the singularities of the critical point are prescribed (the map is from R~ to s2in this case).' Qing characterized the energy at the singularities. A recent review on the singularities of harmonic maps was prepared by Hardt [Har97]. (Singularities for more general energies were studied, for example, by Pismen and Rubinstein [Pisgl].) The results reported there can be used to characterize the behavior of the multiscale representation of high dimensional directions, although these results mainly address the shape of the harmonic map, that is, the critical point of the harmonic energy and not the flow. Of course, for '~eronasuggested a look at this line of work to analyze the singularities of the orientation diffusion flow.
the case of M being of dimension two, whlch corresponds to Eq. (8.9),we have Struwe's results mentioned above.
8.3.3 Anisotropic Diffusion The picture becomes even more interesting for the case 1 I p < 2. Now the gradient descent flow corresponding to Eq. (8.6), in the range 1 < p < 2 (and formally for p = l),is given by the set of coupled PDEs
This system of coupled PDEs defines the anisotropic multiscale representation of U(X, y, 0), which is used as the initial datum to solve Eq. (8.10). In contrast with the isotropic case, now both terms in Eq. (8.10)are nonlinear and include coupled components. The case of p + 2 in Eq. (8.7)has been studied less in the literature. When M is a domain in Rm and N = Snl, the function v(X) 4 XlllXll, X E Rm, is a critical point of the energy for p E {2,3,...,m  11, for p E [m  1,m) [this interval includes the energy case that leads to Eq. (8.1O)], and for p E [2, m  2J1[Har97]. For n = 2 and p = 1, the variational problem has also been investigated by Giaquinta et al. [Gia93],who addressed, among other things, the correct spaces in which to perform the minimization [in the scalar case, BV(Q,R) is used] and the existence of minimizers. Of course, we are more interested in the results for the flow given by Eq. (8.10),not just its corresponding energy. Some results exist for 1 < p < oo, p + 2, showing in a number of cases the existence of local solutions that are not smooth. To the best of our knowledge, the case of 1 I p < 2, and in particular p = 1,has not been fully studied for the evolution equation.
8.3.4
Examples
Although advanced specialized numerical techniques to solve Eq. (8.6) and its corresponding gradient descent flow have been developed (e.g., [Alo91]),as a first approximation we can basically use the algorithms developed for scalar isotropic and anisotropic diffusion without the unit norm constraint (e.g., [Rud92])to implement Eqs. (8.9)and (8.10) [Coh87].Although these equations preserve the unit norm, numerical errors might violate the constraint. Therefore, between every two steps of the numerical implementation of these equations we add a renormalization step [Coh87]. A number of techniques exist for visualizing vectors: (1)Arrows indicating the vector direction are very illustrative but can be used only for sparse images; they are not very informative for dense data such as gradients or optical flow. (2) The HSV color mapping (applied to orientation)is useful for visualizing whole images of directions while also being able to illustrate details such as small noise. (3)Line integral convolution (LIC) [Cab931 is based on locally integrating at each pixel, in the directions given by the directional data, the values of a random image. The
I . .
::I::: ............... ':"" .......... ................... .......... :I::::: ......*.. .. ............. _..........I. ..... .................... .................. ........... .......... * .................. .............. ......... ... .................. ............. .. ..... ........... .......... ..1.
,.
. .
.
,
Figure 8.4: Examples illustrating the ideas of direction diffusion (see text for details). (From [TanOOal. O 2000 Kluwer Academic Publishers.)
LIC t e c h q u e gives the general form of the flow, whle the color map is useful to detect small noise in the direction (orientation) image. Figure 8.4 (also see color insert) shows a number of simple examples to illustrate the general ideas introduced in this chapter, as well as examples for color image denoising. The top row shows, using LIC, first an image with two regions having different (2D) orientations (original), followed by the results of isotropic diffusion for 200, 2000, and 8000 iterations (scalespace). Note how the edge in the directional data is being smoothed out. The horizontal and vertical directions are being smoothed out to converge to the diagonal average.
The second row shows the result of removing noise in the directional data. The original noisy image is shown first, followed by the results with isotropic and anisotropic smoothing. Note how the anisotropic flow gets rid of the noise (outliers) while preserving the rest of the data, while the isotropic flow also affects the data itself while removing the noise. Note that since the discrete theory developed by Perona [Per981 applies only to small changes in orientation, theoretically it cannot be applied to the images we have seen so far, all of whch contain sharp discontinuities in the directional data (and the theory is only isotropic). We can use the directions diffusion framework to process color images. That is, we separate the color direction from the color magnitude and use the harmonic flows to smooth the direction (chromaticity),relying on standard edge preserving scalar filters to smooth the magnitude (brightness). This gives very good color image denoising results, as shown in the bottom two rows of Fig. 8.4, which show, from left to right, the original, noisy, and reconstructed images, reproduced here in graylevels only. See Tang et al. [TanOOb]for additional examples, comparisons with the literature, and details. Other directional filters for color denoising, based on vectorial median filtering, can be found elsewhere (e.g., [CasOO, Tra93, Tra961).
8.3.5
Discussion
We observe that the theory of harmonic maps provides a fundamental framework for directional diffusion in particular and diffusion on general manifolds in general. This framework opens a whole new research area in PDEbased image processing, both in the theoretical and practical arenas. For example, from the theoretical point of view, we need to perform a complete analysis of the harmonic energy and gradient descent flow for p = 1, the anisotropic case. On a more practical level, it would be interesting to include the harmonic energy as a regularization term in more general variational problems, such as the general optical flow framework, and to use the theory of harmonic maps for other image processing problems, such as denoising data defined on 3D surfaces.
Acknowledgments The original work on geodesic snakes was developed with Profs. V. Caselles and R. Kimmel. The direction diffusion work is the result of collaboration with Prof. Caselles and B. Tang. The work on vector edges was jointly developed with Prof. D. Ringach. D. H. Chung helped with many of the images in this chapter. We thank Profs. R. Kohn, K. Rubinstein, S. Osher, T. Chan, J. Shen, L. Vese, D. Heeger, V. Interrante, and P. Perona, and B. Cabral and C. Leedom for their comments. This work was partially supported by Office of Naval Research grant ONRN000149710509, the Office of Naval Research Young Investigator Award, the Presidential Early Career Awards for Scientists and Engineers (PECASE),a National Science Foundation CAREER Award, the National Science Foundation Learning and Intelligent Systems (LIS) Program, and NSFIRI9306155 (Geometry Driven Diffusion).
References [Alogl] F. Alouges. An energy decreasing algorithm for harmonic maps. In Nematics (J. M. Coron et al., eds.), pp. 113. NATO AS1 Series, Kluwer, Dordrecht, The Netherlands (1991). [Alv92] L. Alvarez, P. L. Lions, and J. M. Morel. Image selective smoothing and edge detection by nonlinear diffusion. SUM J. Numer. Anal, 29, 845866 (1992). [Ah931 L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel. Axioms and fundamental equations of image processing. Arch. Rational Mechan. 123, 199257 (1993). [Bla98] M. Black, G. Sapiro, D. Marimont, and D. Heeger. Robust anisotropic diffusion. IEEE Trans. Image Process. 7(3), 42 1432 (March 1998). [Blo98] P. Blomgren and T. F. Chan. Color TV Total variation methods for restoration of vector valued images, IEEE Trans. Image Process. 7(3), 304309 (March 1998). [Bre86] H. Brezis, J. M. Coron, and E. H. Lieb. Harmonic maps with defects. Commun. in Mathemat. Phys. 107,649705 (1986). [Cab931 B. Cabral and C. Leedom. Imaging vector fields using line integral convolution. In Computer Graphics (Proc. SIGGRAPH) (1993). [Cas93] V. Caselles, F. Catte, T. Coll, F. Dibos. A geometric model for active contours. Numerische Mathematik 66, 13 1 (1993). [Cas97a] V. Caselles, R. Kimrnel, and G. Sapiro. Geodesic active contours. Intl. J. Comput. Vision 22(1), 6179 (1997). [Cas97b] V. Caselles, R. Kirnrnel, G. Sapiro, and C. Sbert. Minimal surfaces based object segmentation. IEEE Trans. Putt Anal. Machine Intell. 19(4), 394398 (1997). [Cas98] V. Caselles, J. M. Morel, G. Sapiro, and A. Tannenbaum, Eds. Special Issue on Partial Differential Equations and GeometryDrivenDiffusion in Image Processing and Analysis. IEEE Trans. Image Process. 7(3), 421432 (March 1998). [CasOO] V. Caselles, G. Sapiro, and D. H. Chung. Vector median filters, morphology, and PDEs: Theoretical connections. J. Mathematical Imaging and Vision 12, 108120 (2000). [Cat921 F. Catte, P.L. Lions, J.M. Morel, and T. Coll. Image selective smoothing and edge detection by nonlinear diffusion. SLAM J. Numer. Anal. 29, 182 193 (1992). [Cha92] K. C. Chang, W. Y. Ding, and R. Ye. Finitetime blowup of the heat flow of harmonic maps from surfaces. J. Differential Geom. 36, 5075 15 (1992).
[Cha99] T. Chan and J. Shen. Variational restoration of nonflat image features: Models and algorithms. University of California at Los Angeles, CAMTR (May 1999). [Che89] Y. Chen. The weak solutions of the evolution problems of harmonic maps. Math. Z. 201, 6974 (1989). [Che95] Y. Chen, J. Li, and F. H. Lin. Partial regularity for weak heat flows into spheres. Commun. Pure Appl. Mathemat. 48,429448 (1995). [ChuOO] D. H. Chung and G. Sapiro. On the level lines and geometry of vector valued images. IEEE Signal Process. Lett., to appear. [Coh87] R. Cohen, R. M. Hardt, D. Kinderlehrer, S. Y. Lin, and M. Luskin. Minimum energy configurations for liquid crystals: computational results. In Theory and Applications of Liquid Crystals, J . L. Ericksen and D. Kinderlehrer, Eds. pp. 991 2 1, SpringerVerlag,New York (1987). [Cohgl] L. D. Cohen. On active contour models and balloons. Comput. Vision, Graph. Image Process.: Image Understanding 53, 2 112 18 (1991). [Coh97] L. D. Cohen and R. Kimmel. Global minimum for active contours models. Intl. J. Comput. Vision 24, 5778 (1997). [Cor9O] J. M. Coron. Nonuniqueness for the heat flow of harmonic maps. Ann. Inst. H. Poincare, Analyse Non Lineaire 7(4),335344 (1990). [Crag21 M. G. Crandall, H. Ishii, and P. L. Lions. User's guide to viscosity solutions of second order partial linear differential equations. Bull. Amer. Mathemat. SOC.27, 167 (1992). [Cum911 A. Cumani. Edge detection in multispectral images. Comput Vision Graph. Image Process. 53,4051 (1991). [DiZ86] S. Di Zenzo. A note on the gradient of a multiimage. Comput. Vision Graph. Image Process. 33, 116125 (1986). [Eck95] M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzle. Multiresolution analysis of arbitrary meshes. In Computer Graphics (Proc. SIGGRAPH),pp. 173182 (1995). [Eel781 J. Eells and L. Lemarie. A report on harmonic maps, Bull. London Mathemat, SOC.10(1),168 (1978). [Eel881 J. Eells and L. Lemarie. Another report on harmonic maps, Bull. London Mathemat. Soc. 20(5), 385524 (1988). [Eel641 J. Eells and J. H. Sampson. Harmonic mappings of Riemannian manifolds, Amer. J. Mathemat. 86, 109160 (1964).
[Fau98] 0. D. Faugeras and R. Keriven. Variational principles, surface evolution, PDEs, levelset methods, and the stereo problem. IEEE Trans. Image Process. 7(3), 336344 (March 1998). [Fe194] M. Feldman. Partial regularity for harmonic maps of evolutions into spheres. Commun. Partial Differential Eqs. 19, 761790 (1994). [Fre95] A. Freire. Uniqueness for the harmonic map flow in two dimensions. Cal. Var. 3, 95105 (1995). [Gia93] M. Giaquinta, G. Modica, and J. Soucek. Variational problems for maps of bounded variation with values in S1. Cal. Var. 1, 87121 (1993). [Gra95] G. H. Granlund and H. Knuttson, Signal Processing for Computer Vision, Kluwer, Boston, MA (1995). [Gui95] F. Guichard and J. M. Morel, Introduction to Partial Differential Equations in Image Processing. Tutorial Notes, IEEE Intl. Conf. Image Processing (Washington, DC, October 1995). [Hak99] S. Haker, S. Angenent, A. Tannenbaum, R. Kilunis, G. Sapiro, and M. Halle. Conformal surface parametrization for texture mapping. University of Minnesota IMA Preprint Series No. 1611 (April 1999). [Ha851 R. M. Haralick and L. G. Shapiro. Image segmentation techniques. Comput. Vision Graphics Image Process. 29, 100132 (1985). [Hart371 R. M. Hardt and F. H. Lin. Mappings minimizing the L p norm of the gradient. Commun. Pure Appl. Mathemat. 40, 555588 (1987). [Har97] R. M. Hardt. Singularities of harmonic maps. Bull. Amer. Mathemat. Soc. 34(1), 1534 (1997). [Kas88] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Intl. J. Comput. Vision 1, pp. 321331, 1988. [Kim951 B. B. Kimia, A. Tannenbaum, and S. W. Zucker. Shapes, shocks, and deformations, I. Intl. J. Comput. Vision 15, 189224 (1995). [Kis95] S. Kichenassamy, A. Kumar, P. Olver, A. Tamenbaum, and A. Yezzi. Gradient flows and geometric active contour models. In Proc. IEEE Intl. Conf:Computer Vision,pp. 81081 5 (Cambridge,June 1995). [Koe84] J. J. Koenderink. The structure of images. Biolog. Cybernet 50, 363370 (1984). [be591 E. Kreyszig. Differential Geometry. University of Toronto Press (1959). [Lee911 H.C. Lee and D. R. Cok. Detecting boundaries in a vector field. IEEE Trans. Signal Process. 39, 11811194 (1991).
[Lee921 T. S. Lee, D. Mumford, and A. L. Yuille. Texture segmentation by minimizing vectorvalued energy functionals: The coupledmembrane model. Proc. European Conference on Computer Vision '92, In Lecture Notes in Computer Science No. 588, pp. 1651 73, SpringerVerlag,New York (1992). [Ling41 T. Lindeberg. ScaleSpace Theory in Computer Vision. Kluwer, Dordrecht, The Netherlands (1994). [Lor991 L. M. Lorigo, 0. Faugeras, W. E. L. Grimson, R. Keriven, and R. Kikinis. Segmentation of bone in clinical knee MRI using texturebased geodesic active contours. In Medical lmage Computing and ComputerAssisted Intervention, MICCAlJ98,pp. 11951204, Springer (1998). [Ma1951 R. Malladi, J. A. Sethian, and B. C. Vemuri. Shape modeling with front propagation: A level set approach. IEEE Trans. Patt. Anal. Machine Intell. 17, 158175 (1995). [Mor94] J.M. Morel and S. Solimini. Variational Methods in Image Segmentation. Birkhauser, Boston (1994). [Mum891 D. Murnford and J. Shah. Optimal approximations by piecewise smooth functions and variational problems. Commun. Pure Appl. Mathemat. 42, 577685 (1989). [Nit921 M. Nitzberg and T. Shiota. Nonlinear image filtering with edge and corner enhancement. IEEE Trans. Patt. Anal. Machine Intell. 14, 826833 (1992). [Osh88] S. J. Osher and J. A. Sethian. Fronts propagation with curvature dependent speed: Algorithms based on HamiltonJacobiformulations. J. Computa. Phys. 79, 1249 (1988). [Osh90] S. Osher and L. I. Rudin. Featureorientedimage enhancementusing shock filters. SLAM J. Numer. Anal. 27, 919940 (1990). [Par971 N. Paragios and R. Deriche. A PDEbased levelset approach for detection and tracking of moving objects. INRIA Tech. Report No. 3173, SophiaAntipolis (May 1997). [Per901 P. Perona and J. Malik. Scalespace and edge detection using anisotropic diffusion. IEEE Trans. Patt. Anal. Machine Intell. 12, 629639 (1990). [Per981 P. Perona. Orientation diffusion. IEEE Trans. lmage Process. 7(3),45 7467 (March 1998). [Pis911 L. M. Pismen and J. Rubinstein. Dynamics of defects. In Nematics, J. M. coron et al. eds. pp. 303326, NATO AS1 Series, Kluwer, Dordrecht, The Netherlands (1991). [Qin95] J. Qing. On singularities of the heat flow for harmonic maps from surfaces into spheres. Commun. Anal. Geom. 3(2), 297315 (1995).
CHAPTER8: NONLINEARPARTIALDIFFERENTIAL EQUATIONS
247
[Rom94] B. Romeny (ed.). Geometry Driven Diffusion in Computer Vision. Kluwer, Dordrecht, The Netherlands (1994). [Rud92] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D 60, 259268 (1992). [Rud94] L. I. Rudin and S. Osher. Total variation based image restoration with free local constraints. In Proc. IEEE Intl, Conf: Image Processing. Vol I, pp. 3135 (Austin, Texas, 1994). [Sap96a] G. Sapiro. From active contours to anisotropic diffusion: connections between the basic PDE's in image processing. In Proc. IEEE Intl. Conf:on Image Processing, pp. 477480 (Lausanne,Switzerland, 1996). [Sap96b] G. Sapiro and D. Ringach. Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans. Image Process. 5 , 15821586 (1996). [Sap97a] G. Sapiro. Color snakes. Comput. Vision Image Understanding 68(2), 247253 (1997). [Sap97b] G. Sapiro and V. Caselles. Histogram modification via differential equations. J. Differential Eqs. 135(2),238268 (1997). [Shag51 J. Shah. Recovery of shapes by evolution of zerocrossings.Tech. Report, Northeastern Univ. Math. Dept., Boston, MA (1995). [Soc98] N. Sochen, R. Kimmel, and R. Malladi. A general framework for lowlevel vision. IEEE Trans. Image Process. 7(3), 310318 (1998). [Str85] M. Struwe. On the evolution of harmonic mappings of Riemannian surfaces. Comment. Math. Helvetici 60, 558581 (1985). [TanOOa] B. Tang, G. Sapiro, and V. Caselles. Diffusion of general data on nonflat manifolds via harmonic maps theory: The direction diffusion case. Intl. J. Comput. Vision 36, 149161 (2000). [TanOOb] B. Tang, G. Sapiro, and V. Caselles. Color image enhancement via chromaticity diffusion. IEEE Trans. Image Process., to be published. [Tek95] H. Tek and B. B. Kirnia. Image segmentation by reactiondiffusionbubbles. In Proc. IEEEIntl. Conf:Computer Vision,pp. 156162 (Cambridge,June 1995). [Ter88] D. Terzopoulos,A. Witkin, and M. Kass. Constraints on deformable models: recovering 3D shape and nonrigid motions. Artific. Intell. 36, 91123 (1988). [Tra93] P. E. Trahanias and A. N. Venetsanopoulos. Vector directional filters: a new class of multichannel image processing filters. IEEE Trans. Image Process. 2 , 528534 (1993).
[Tra96] P. E. Trahanias, D. Karakos, and A. N. Venetsanopoulos. Directional processing of color images: Theory and experimental results. IEEE Trans. Image Process. 5(6),868880 (June 1996). [Wei96] J. Weickert. Foundations and applications of nonlinear anisotropic diffusion filtering. Zeitscgr. Angewandte Math. Mechan. 76, pp. 283286, 1996. [Whig41 R. T. Whitaker and G. Gerig. Vectorvalued diffusion. In Geometry Driven Diffusion in Computer Vision (B. ter Haar Romeny, ed.), pp. 93134. Kluwer, Boston (1994). [Whigs] R. T. Whitaker. Algorithms for implicit deformable models. In Proc. IEEE Intl. Conf:on Computer Vision (Cambridge,June 1995). [Wit831 A. P. Witkin. Scalespacefiltering. Intl. Joint Conf:Artificial Intelligence, pp. 10191021 (Karlsruhe, Germany, 1983). [You961 Y. L. You, W. Xu, A. Tannenbaum, and M. Kaveh. Behavioral analysis of anisotropic diffusion in image processing. IEEE Trans. Image Process. 5(1l), 15391553 (November 1996). [Zha99] D. Zhang and M. Hebert. Harmonic maps and their applications in surface matching. In Proc. IEEE Computer Vision Pattern Recognition (Colorado, June 1999). [Zhu95] S. C. Zhu, T. S. Lee, and A. L. Yuille. Region competition: Unifying snakes, region growing, energy/Bayes/MDL for multiband image segmentation. In Proc. IEEE Intl. Conf on Computer Vision, pp. 416423 (Cambridge, MA, June 1995).
[Zuc76] S. W. Zucker. Region growing: Childhood and adolescence (Survey). Comput. Vision Graph. Image Process. 5 , 382399 (1976).
RegionBased Filtering of Images and Video Sequences: A Morphological Viewpoint
Department of Signal Theory and Communications Universitat Politecnica de Catalunya Barcelona, Spain
9.1
Introduction
Data and signal modeling for images and video sequences is experiencing important developments. Part of this evolution is due to the need to support a large number of new multimedia services. Traditionally, digital images were represented as rectangular arrays of pixels, and digital video was seen as a continuous flow of digital images. New multimedia applications and services imply a representation that is closer to the real world or, at least, that takes into account part of the process that has created the digital information. Contentbased compression and indexing are two typical examples of applications for which new modeling strategies and processing tools are necessary. For contentbased image or video compression, the representation based on an array of pixels is not appropriate if one wants to be able to act on objects in the image, to encode differently the areas of interest, or to assign different behaviors to the entities represented in the image. In these applications, the notion of object
is essential. As a consequence, the data modeling has to be modified and, for example, has to include regions of arbitrary shapes to represent objects. Contentbased indexing applications are also facing the same lund of challenges. For instance, the video representation based on a flow of frames is inadequate for a large number of video indexing applications. Among the large set of functionalities involved in a retrieval application, consider, for example, browsing. The browsing functionality should go far beyond the "fast forward" and "fast reverse" allowed by VCRs. One would like to have access to a table of contents of the video and be able to jump from one item to another. This kind of functionality implies at least a structuring of the video in terms of individual shots and scenes. Of course, indexing and retrieval involve also a structuring of the data in terms of objects, regions, semantic notions, etc. In both of these examples, the data modeling has to take into account part of the creation process: an image is created by projection of a visual scene composed of 3D objects onto a 2D plane. Modeling the image in terms of regions is an attempt to know the projection of the 3D object boundaries in the 2D plane. Video shots detection also aims at finding what has been done during the video editing process and where boundaries between elementary components have been introduced. In both cases, the notion of region turns out to be central in the modeling process. Note that regions may be spatial connected components but also temporal or spatiotemporal connected components in the case of video. Besides the modeling issue, it has to be recognized that most image processing tools are not suited to regionbased representations. For example, the vast majority of low level processing tools such as filters are very closely related to the classical pixelbased representation of signals. Typical examples include linear convolution with an impulse response, the median filter, morphological operators based on erosion and dilation with a structuring element, etc. In all cases, the processing strategy consists of modifying the values of individual pixels by a function of the pixel values in a local window. Early examples of regionbased processing can be found in the literature in the field of segmentation. For example, the classical split and merge algorithm [Hor74] first defines a set of elementary regions (the split process) and then interacts directly on these regions, allowing them to merge under certain conditions. Recently, a set of morphological filtering tools called connected operators has received much attention. Connected operators are regionbased filtering tools because they do not modify individual pixel values but directly act on the connected components of the space where the image is constant, the socalled flat zones. Intuitively, connected operators can remove boundaries between flat zones but cannot add new boundaries nor shift existing boundaries. The related literature rapidly grows and involves theoretical studies (Chapter 10 herein and [Cre95, Hei97a, Mat97, Mey98a, Mey98b, Ron98, Ser93, Ser98]), algorithm developments [Bre96, Gom99, Sa198, Sa100, Vin93a, Vin93b1, and applications [Cre97, Sa192, Sa195, Vi1981. The goals of this chapter are (1) to provide an introduction to connected operators for gray level images and video sequences and (2) to dis
CHAPTER 9:
REGIONBASEDFILTERINGOF IMAGES AND VIDEO
2 51
cuss the techniques and algorithms that up to now have been the most successful within the framework of practical applications. The organization of the chapter is as follows: Section 9.2 introduces the notation and highlights the main drawbacks of classical filtering strategies. Section 9.3 presents the basic notions related to connected operators and discusses some early examples of connected operators. In practice, the two most successful strategies to define connected operators are based either on reconstruction processes or on tree representations, discussed in Sec. 9.4 and 9.5, respectively. Finally, conclusions are given in Sec. 9.6.
9.2 Classical Filtering Approaches In this section, we define the notation to be used herein and review some of the basic properties of interest in thls chapter [Ser82,Ser881. We deal exclusively with discrete images f ( n ) or video sequences f t ( n ) ,where n denotes the pixel or space coordinate (a vector in the case of 2D images) and t the time instant in the case of a video sequence. In the lattice of gray level functions, an image f is said to be smaller than an image g if and only if
An operator q acting on an input f is said to be: increasing:
idempotent: extensive:
antiextensive:
a morphological filter: an opening: a closing: selfdual:
Vf f 5 9 3 q(f 5 q(9) (The order relationship between images is preserved by the filtering.) f qcqcf 1) = q ( f ) (Iteration of the filtering process is not needed) f f O, for t = O,
(10.29)
which satisfies the semigroup property ks ~3 kt = ks+t. U s i n g kt in place of g as the kernel in the basic morphological operations leads to defining the multiscale dilation and erosion of f : R 2 ~ R by kt a s the scalespace functions
& ( x , y , t ) ~=f ~3k t ( x , y ) ,
E ( x , y , t ) ~=f o k t ( x , y ) ,
(10.30)
where 8(x, y, O) = e(x, y, O) = f (x, y). In practice, a useful class of functions k consists of flat structuring functions,
k(x, Y)
f0 co %.
for (x, y ) ~ B, for ( x , y ) ~ B,
(10.31)
that are the 0 /  oo indicator functions of compact convex planar sets B. The general PDE generating the multiscale flat dilations of f by a general compact
302
PETROS MARAGOS
convex symmetric B is [Alv93, Bro94, Hei97] t~t =
sptf(B)(6x,6y),
(10.32)
where sptf (B) is the s u p p o r t function of B, sptf(B)(x,y) ~
V
a x + by.
(10.33)
(a,b)~B
Useful cases of structuring sets B are obtained by the unit balls
Bp = { ( x , y ) e R2"ll(x,y)llp f (x, y ) , it acts as an erosion PDE a n d reverses the
PETROSMARAGOS
308
direction of propagation. The final result uoo (x, y ) = limt.oo u ( x , y , t) is equal to the output from a general class of morphological filters, called levelings, which were introduced by Meyer [Mey98], have many useful properties, and contain as special cases the reconstruction openings and closings.
10.4
Curve Evolution
Consider at time t = 0 an initial simple, smooth, closed planar curve y (0) that is propagated along its normal vector field at speed V for t > 0. Let this evolving curve (front) u be represented byits position vector C(p, t) = (x(p, t), y ( p , t)) and be parameterized by p ~ [0,J] so that it has its interior on the left in the direction of increasing p and C(0, t) = C(J, t). The curvature along the curve is
K= K(p, t) a y~pXp+y2)3/2YpXpp
(10.53)
A general front propagation law (flow) is
~C(p,t) = Vlq(p, t), Ot
(10.54)
with initial condition u (0) = {C(p, 0) 9p ~ J }, w h e r e / q (p, t) is the instantaneous unit outward normal vector at points on the evolving curve and V = Ct 9N is the normal speed, with Ct = ~C/~t. This speed may depend on local geometrical information such as the curvature, global image properties, or other factors independent of the curve. If V = 1 or V =  1 , then y(t) is the boundary of the dilation or erosion of the initial curve u by a disk of radius t. In general, if B is an arbitrary compact, convex, symmetric planar set of unit scale and if we dilate the initial curve u with tB and set the new curve u equal to the outward boundary of y(0) 9 tB, then this curve evolution can also be generated by the PDE of Eq. (10.54) using a speed [Are93, Sap93] V = sptf(B)(/q),
(10.55)
where sptf (B) is the support function of B. Another important speed model has been studied extensively by Osher and Sethian [Osh88, Set96] for general evolution of interfaces and by Kimia et al. [Kim90] for shape analysis in computer vision: V = 1  EK,
E > 0.
(10.56)
As analyzed by Sethian [Set96], when V = 1 the front's curvature will develop singularities, and the front will develop corners (i.e., the curve derivatives will develop shocksmdiscontinuities) at finite time if the initial curvature is anywhere negative. Two ways to continue the front beyond the corners are as follows: (1) If the front is viewed as a geometric curve, then each point is advanced along the
C H A P T E R 10:
309
DIFFERENTIAL MORPHOLOGY
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05 0
0.05 0.1
0.05 0
0.2
0.4
0.6
0.8
0.1 1
0
0.2
0.4
(a) 0.25
0.2 I
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0.05
0.05
0.1 0.2
0.4
0.6
(c)
0.8
1
0.6
0.8
1
(b)
0"25 /
0
0.6
0.8
0.1
0
0.2
0.4
(d)
Figure 10.8: Evolution of the curve (signal graph) (p, cos(6rrp)/lO), p ~ [0, 1]. Evolved curves are plotted from t = 0 to t = 0.14 at increments of 0.02. The numerical simulation for (b), (c), and (d) is based on the Osher and Sethian algorithm with Ax = 0.005 and At chosen small enough for stability. (a) V = 1, "swallowtail" weak solution. (b) V = 1, entropy weak solution with At = 0.002. (c) V = 1  0.05g with At = 0.0002. (d) V = 1  0.1g with At = 0.0001. normal by a distance t, and hence a "swallowtail" is formed b e y o n d the corners by allowing the front to pass through itself. 2) If the front is viewed as the boundary separating two regions, an entropy condition is i m p o s e d to disallow the front to pass through itself. In other words, if the front is a propagating flame, then "once a particle is burnt it stays burnt" [Set96]. The same idea has also b e e n u s e d to m o d e l grassfire propagation leading to the medial axis of a shape [Blu73]. It is equivalent to u s i n g Huygen's principle to construct the front as the set of points at distance t from the initial front. This can also be obtained from multiscale dilations of the initial front by disks of radii t > 0. Both the swallowtail and the entropy solutions are weak solutions. The e x a m p l e s in Fig. 10.8 show that, w h e n e > 0, m o t i o n with curvaturedependent s p e e d has a s m o o t h i n g effect. Further, the limit of the solution for the V = 1  EK case as e ~ 0 is the entropy solution for the V = 1 case [Set96]. To overcome the topological p r o b l e m of splitting and m e r g i n g and numerical p r o b l e m s with the Lagrangian formulation of Eq. (10.54), an Eulerian formulation
310
PETROS MARAGOS
was proposed by Osher and Sethian [Osh88] in which the original curve y(O) is first embedded in the surface of an arbitrary 2D Lipschitz continuous function qbo(x,y) as its level set (contour line) at zero level. For example, we can select 450 (x, y ) to be equal to the signed distance function from the boundary of y(0), positive (negative) in the interior (exterior) of y (0). Then, the evolving planar curve is embedded as the zerolevel set of an evolving spacetime function qb(x, y , t): u
= { (x, y ) : r
(10.57)
y , t) = 0},
y(0) = {(x, y ) : ~,(x, y , 0)  ~o(X, y )  0}.
(10.58)
Geometrical properties of the evolving curve can be obtained from spatial derivatives of the level function. Thus, at any point on the front the curvature and outward normal of the level sets can be found from 4>:
N=
llV~ll'
llV
ll
(10.59)
The curve evolution PDE of Eq. (10.54) induces a PDE generating its level function:
qbt = VIIV~II, (10.60)
4~(x, y , O) = 4>o(X, y ) .
If V = 1, the above function evolution PDE is identical to the fiat circular dilation PDE of Eq. (10.3 5) by equating scale with time. Thus, we can view this specific dilation PDE as a special case of the general function evolution PDE of Eq. (10.60) in which all level sets expand in a homogeneous medium with V = 1. Propagation in a heterogeneous medium with V = V (x, y ) > 0 leads later to the eikonal PDE.
10.5
Distance Transforms
10.5.1
Distance Transforms and Wave Propagation
For binary images, the distance transform is a compact way to represent their multiscale dilations and erosions by convex polygonal structuring elements whose shape depends upon the norm used to measure distances. Specifically, a binary image can be divided into the foreground set S _c R 2 and the background set S c = {(x, y ) : (x, y ) ~ S}. For shape analysis of an image object S, it is often more useful to consider its inner distance transform by using S as the domain to measure distances from its background. However, for the applications discussed herein, we need to view S as a region marker or a source emanating a wave that will propagate away from it into the domain of S c. Thus, we define the outer distance t r a n s f o r m of a set S with respect to the metric induced by some norm II 9lip, p = 1, 2, . . . . oo, as the distance function: Dp(S)(x,y)
=~
A (v,w)~S
II(x  v , y
 w)llp.
(10.61)
CHAPTER 10:
311
DIFFERENTIAL MORPHOLOGY
If Bp is the unit ball induced by the norm II 9lip, thresholding the distance transform at level r > 0 and obtaining the corresponding level set yields the morphological dilation of S by the ball Bp at scale r:
S 9 rBp = { ( x , y ) : D p ( S ) ( x , y ) > r}.
(10.62)
The boundaries of these dilations are the wavefronts of the distance propagation. Multiscale erosions of S can be obtained from the outer distance transform of S c. In addition to being a compact representation for multiscale erosions and dilations, the distance transform has found many applications in image analysis and computer vision. Examples include smoothing, skeletonization, size distributions, shape description, object detection and recognition, segmentation, and path finding [Blu73, Bor86, Nac96, Pre93, Ros66, Ros68, Ver91, Vin91b]. Thus, many algorithms have been developed for its computation. Using Huygen's construction, the boundaries of multiscale dilationserosions by disks can also be viewed as the wavefronts of a wave initiating from the original image boundary and propagating with constant normal speed, that is, in a homogeneous medium. Thus, the distance function has a minimum timeofarrival interpretation [Blu73], and its isolevel contours coincide with those of the wave phase function. Points at which these wavefronts intersect and extinguish themselves (according to Blum's grassfire propagation principle) are the points of the Euclidean skeleton axis of S [Blu73]. Overall, the Euclidean distance function D2 (S) is the weak solution of the following nonlinear PDE:
1l~TE(x, y)ll2 = 1,
(x, y ) ~ S c,
E(x, y ) = O,
(x, y ) e aS.
(10.63)
This is a special case of the eikonal PDE [[~7E(x, y ) [[2 =/7 (x, y ) that corresponds to wave propagation in heterogeneous media and whose solution E is a weighted distance function, whose weights/7 (x, y ) are inversely proportional to the varying propagation speed [Lev70, Rou92, Ver90].
10.5.2
Distance Transforms as Infimal Convolutions and Slope Filters
If we consider the 0/oo indicator function of S,
I(S)(x,y)
A~0 =~+oo
for ( x , y ) ~ S, for(x,y)r
(10.64)
and the L p norm structuring function,
gp(x, y ) = II(x, y)llp,
(10.65)
it follows that the distance transform can be obtained from the infimal convolution of the indicator function of the set with the norm function: Dp(S) = I(S) ~' gp.
(10.66)
312
PETROS MARAGOS
Further, since the relative ordering of distance values does not change if we raise them to a positive power m > 0, it follows that we can obtain powers of the distance function by convolving with the respective powers of the norm function:
[Dp(S)] m = I(S) ~' (gp)m.
(10.67)
The infimal convolution in Eq. (10.66) is equivalent to passing the input signal, that is, the set's indicator function, through an ETI system with slope response [Mar96] (0 for Ilsllq < 1, G^ (s) oo for Ilslla > 1, (10.68) where q is the conjugate exponent of p (1/p + 1/q = 1). That is, the distance transform is the output of an idealcutoff slopeselective filter that rejects all input planes whose slope vector falls outside the unit ball with respect to the [] 9IIa norm but passes all the rest unchanged.
10.5.3
Euclidean Distance Transforms of Binary Images and Approximations
To obtain isotropic distance propagation, we want to employ the Euclidean distance transform, that is, using the norm ]] 9[I2 in Eq. (10.61), since it gives multiscale morphology with the disk as the structuring element. However, computing the Euclidean distance transform of discrete images has a significant computational complexity. Thus, various techniques have been used to obtain an approximate or the exact Euclidean distance transform at a lower complexity. Four types of approaches that deal with this problem are as follows: (1) Discrete metrics on grids that yield approximations to the Euclidean distance. Their early theory was developed by Rosenfeld and Pfaltz [Ros66, Ros68], based on either sequential or parallel operations. This was followed later by a generalization developed by Borgefors [Bor86] and based on chamfer metrics that yielded improved approximations to the Euclidean distance. (2) Fast algorithmic techniques that can obtain the exact Euclidean distances by operating on complex data structures (e.g., [Dan80, Vin91a]). (3) Infimal convolutions of binary images with a parabolic structuring function, which yield the exact squared Euclidean distance transform [Boo92, Hua94, Ste80]. This follows from Eq. (10.67) by using m = 2 with the Euclidean norm (p = 2): [D2(S)] 2 = I(S) (9' (g2) 2.
(10.69)
The kernel in the above infimal convolution is a convex parabola [g2 (x, y)]~ II(x, y)II 2 = x 2 + y2. Note that the above result holds for images and kernels defined both on the continuous and on the discrete plane. Of course, convolution of the image with an infiniteextent kernel is not possible, and hence truncation of the parabola is used, which incurs an approximation error. The complexity of this convolution approach can be reduced significantly by using dimensional
CHAPTER 10:
313
DIFFERENTIALMORPHOLOGY
decomposition of the 2D parabolic structuring function by expressing it either as the dilation of two 1D quadratic structuring functions [Boo92] or as the dilation of several 3 • 3 kernels that yields a truncation of the 2D parabola [Hua94, Shi91]. (4) Efficient numerical algorithms for solving the nonlinear PDE (10.63) that yield arbitrarily close approximations to the Euclidean distance function. Approach (4) yields the best approximations and will be discussed later. Of the other three approaches, (1) and (3) are more general than (2), have significant theoretical structure, and can be used with even the simplest data structures, such as rectangularly or hexagonally sampled image signals. Next we elaborate on approach (1), which has been studied the most.
10.5.4
Chamfer Distance Transforms
The general chamfer distance transform is obtained by propagating local distance steps within a small neighborhood. For each such neighborhood the distance steps form a mask of weights that is infimally convolved with the image. For a 3x3pixel neighborhood, if a and b are the horizontal and diagonal distance steps, respectively, the outer (a, b) chamfer distance transform of a planar set S can also be obtained directly from the general definition in Eq. (10.61) by replacing the general L p norm II. lip with the (a, b) chamfer norm
II(x.y)II..b ~ m a x ( I x l , lYl)a + m i n ( I x l , lYl)(b  a).
(10.70)
The unit ball corresponding to this chamfer norm is a symmetric octagon, and the resulting distance transform is Da,b(S)(x,y)
=
A
II(x 
v,y

W) Ila,b.
(10.71)
(v,w)~S
Note that the above two equations apply to sets S and points (x, y ) both in the continuous plane R 2 as well as in the discrete plane Z2. For a 3 x 3pixel neighborhood, the outer (a, b) chamfer distance transform of a discrete set S ___ Z2 can be obtained via the following sequential computation [Bor86, Ros66]: U n ( i , j ) = min [ U n ( i  1,j) + a, U n ( i , j  1) + a, U n ( i  1 , j  1) + b, Un(i + 1 , j  1) + b, u n  l ( i , j ) ] 9 (10.72) Starting from u0 = I (S) as the 0 / co indicator function of S, two passes (n = 1, 2) of the 2D recursive erosion of Eq. (10.72) suffice to compute the chamfer distance transform of S if S c is bounded and simply connected. During the first pass the image is scanned from topleft to bottomright using the fourpoint nonsymmetric halfplane submask of the 3 x 3 neighborhood. During the second pass the image is scanned in the reverse direction using the reflected submask of distance steps. The final result U2 (i, j ) is the outer (a, b) chamfer distance transform of S evaluated at points of Z 2. An example of the three images, u0, Ul, and u2, is shown in Fig. 10.9.
314
PETROS MARAGOS
Figure 10.9: Sequential computation of the chamfer distance transform with optimal distance steps in a 3x3 mask. (a) Original binary image, 450x450 pixels, (b) result after forward scan, and (c) final result after backward scan. [In (b) and (c) the distances are displayed as intensity values modulo a constant.] Thus, the sequential implementation of the local distance propagation is done via simple recursive minsum difference equations. We shall show that these equations correspond to ETI systems with infinite impulse responses and binary slope responses. The distance propagation can also be implemented in parallel via nonrecursive minsum equations, which correspond to ETI systems with finite impulse responses, as explained next.
Sequential Computation and IIR Slope Filters Consider a 2D minsum autoregressive difference equation with output mask and coefficients as in Fig. 10.10c: u(i,j)
= min[u(i
u(i
1 , j ) + a, u ( i , j 
1) + a,
1 , j  1) + b, u ( i + 1 , j  1) + b, f ( i , j ) ] ,
(10.73)
where f = I(S) is the 0/co indicator function of a set S representing a discrete binary image. Then consider the following distance transformation of f obtained in two passes" During the forward pass Eq. (10.73) is recursively run over f(i,j) in a bottomtotop, lefttoright scanning order. The forward pass mapping f ~ u is an ETI system with an infinite impulse response (found via induction):
gf(i,j)
=
ll(i,j)lJa,b +co
fori+j>0, otherwise.
j>0,
(10.74)
A truncated version of g f is shown in Fig. 10.10d. The slope response Gf(s) of this ETI system is equal to the 0 /  co indicator function of the region shown (for a = 3, b = 4) in Fig. 10.5(b). During the backward pass a recursion similar to Eq. (10.73) but in the opposite scanning order and using as output mask the reflected version of Fig. 10.10c is run over the previous result u(i, j) to yield a signal d(i, j) that is the final distance transform of f ( i , j). The backward pass mapping u ~ d is an ETI system with an infinite impulse response g b ( i , j ) = g f (  i ,  j ) and a slope response GD(S) =
Gf(s).
C H A P T E R 10:
t a 0 a (a)
b a b
315
DIFFERENTIAL MORPHOLOGY
b a b
*
3b a + 2b 2a + b 3a 2a + b a + 2b 3b
a + 2b 2b a + b 2a a + b 2b a + 2b
2a + b a + b b a b a + b 2a + b
3a 2a a 0 a 2a 3a
2a + b a + b b a b a + b 2a + b
a + 2b 2b a + b 2a a + b 2b a + 2b
3b a + 2b 2a + b 3a 2a + b a + 2b 3b
~)
b oo O0
a 0
b a
3b oo oo oo
O0
O0
O0
O0
O0
O0
O0
O0
O0
co
oo
oo
co
co
oo
co
O0
O0
O0
O0
O0
IX)
O0
(c)
a + 2b 2b oo oo
2a + b a + b b co
3a 2a a 0
2a + b a + b b a
a + 2b 2b a+b 2a
3b a + 2b 2a+b 3a
(d) Figure 10.10: Coefficient masks and impulse responses of ETI systems associated with computing the (a, b) chamfer distance transform. (a) Local distances within the 3x3pixel unit "disk." (b) Distances from origin by propagating three times the local distances in (a); also equal to a 7x7pixel central portion of the infinite impulse response of the overall system associated with the distance transform. (c) Coefficient mask for the minsum difference equation computing the forward pass for the chamfer distance. (d) A 7x7pixel portion of the infinite impulse response of the system corresponding to the minsum difference equation computing the forward pass. Since infimal convolution is an associative operation, the distance t r a n s f o r m m a p p i n g f ~ d is an ETI system with an infinite impulse response g = g f ~' gb" d = (fm' g/) m'gb = fm' (g/m'gb) = f m'g.
(10.75)
The overall slope response, G($) = G f ( s ) + Gb(S) = Gf(s) + G f (  s ) ,
(10.76)
of this distance t r a n s f o r m ETI system is the 0 /  c o indicator function of a b o u n d e d convex region shown in Fig. 10.5c for a = 3, b = 4. Further, by using induction on ( i , j ) and symmetry, we find that g = g f ~' gb is equal to the ( a , b ) chamfer distance function: g ( i , j ) = II(i,j)lla,b. (10.77) A t r u n c a t e d version of g is shown in Fig. 10.10b. Thus, our analysis has proved using ETI systems theory that the twopass computation via recursive m i n  s u m difference equations whose coefficients are the local chamfer distances yields the (a, b) chamfer distance transform: [I(S) ~' g f ] ~'
~qb = D a , b ( S ) .
(10.78)
316
PRTROS MARAGOS
Two special cases are the wellknown cityblock and chessboard discrete distances [Ros66]. The cityblock distance transform is obtained using a = 1 and b = + oo, that is, using the fivepixel rhombus as the unit "disk." It is an ETI system with impulse response g(i, j) = li] + Ijl and the slope response being the indicator function of the unit square {s : ]]sl]~ = 1}. Similarly, the chessboard distance transform is obtained using a = b = 1. It is an ETI system with impulse response g (i, j) = max(I i I, lj I) and the slope response being the indicator function of the unit rhombus {s : ]]sll 1 = 1}.
Parallel Computation and FIR Slope Filters The (a, b) chamfer distance transform can be implemented alternatively using parallel operations. Namely, let A {g(i,j) go(i,j) = +oo
for Ii[, Ijl< 1, otherwise,
(10.79)
be the 3x3pixel central portion of g defined in Eq. (10.77). It can be shown via induction that the nthfold infimal convolution of go with itself yields g in the limit: g = nlira (go (~' go)... ~' go. (10.80)    ~ oo ~ n ti}nes Figure 10.10b shows the intermediate result for n = 3 iterations. Similar finite decompositions of discrete conical functions into infimal convolutions of smaller kernels have been studied elsewhere [Shi91]. Consider now the nonautoregressive minsum difference equation 1
d n ( i , j ) = /~
1
/~ g 0 ( i , j ) + dnl (i

k , j  g),
(10.81)
k=le=i
run iteratively for n = 1, 2 , . . . , starting from do = f . Each iteration is equivalent to the infimal convolution of the previous result with a finite impulse response equal to go. By iterating these local distances to the limit, the final distance transform is obtained: d = limnoo dn. In practice, when the input image f has finite support, the number of required iterations is finite and bounded by the image diameter.
Optimal Chamfer Distance Transforms Selecting the steps a, b under certain constraints leads to an infinite variety of chamfer metrics based on a 3 x 3 mask. The two wellknown and easily computable special cases of the cityblock metric with (a, b) = (1, oo) and the chessboard metric with (a, b) = (1, 1) give the poorest discrete approximations to Euclidean distance (and to multiscale morphology with a disk structuring element), with errors reaching 41.4% and 29.3%, respectively. Using Euclidean steps (a, b) = (1, ,/2) yields a 7.61% maximum error. Thus, a major design goal is to reduce
CHAPTER 10:
DIFFERENTIAL MORPHOLOGY
317
Figure 10.11: Top row: distance transforms of a binary image obtained via (a) a (1,1) chamfer metric, (b) the optimal 3x3 chamfer metric, and (c) curve evolution. These distances are displayed as intensities modulo a constant h = 20. Bottom row: the multiscale dilations (at scales t = nh, n = 1, 2, 3 ..... ) of the original set (filled black regions) were obtained by thresholding the three distance transforms at isolevel contours whose levels are multiples of h using the following structuring elements: (d) and (e) the unitscale polygons corresponding to the metrics used in (a) and (b) and (f) the disk. All images have a resolution of 450x600 pixels.
the a p p r o x i m a t i o n error b e t w e e n the chamfer distances and the c o r r e s p o n d i n g Euclidean distances [Bor86]. A suitable error criterion is the m a x i m u m absolute error (MAE) b e t w e e n a unit chamfer ball and the c o r r e s p o n d i n g unit disk [But98, Ver91]. The optimal steps obtained by Butt and Maragos [But98] for m i n i m i z i n g this MAE are a = 0.9619 and b = 1.3604, which give a 3.957% m a x i m u m error. In practice, for faster implementation, integervalued distance steps A and B are used, and the c o m p u t e d distance t r a n s f o r m is divided by a n o r m a l i z i n g constant k, which can be realvalued. We refer to such a metric as (a, b) = (A, B)/k. Using two decimal digits for t r u n c a t i n g optimal values and optimizing the triple (A, B, k) as done by Butt and Maragos [But98] for the smallest possible error yields A = 70, B = 99, and k = 72.77. The c o r r e s p o n d i n g steps are (a,b) = (70,99)/72.77, yielding a 3.959% MAE. See Fig. 10.11 for an example. By working as above, optimal steps that yield an even lower error can also be found for chamfer distances with a 5 x 5 m a s k or larger n e i g h b o r h o o d [But98].
318
10.6
PETROS MARAGOS
Eikonal PDE and Distance Propagation
The main postulate of geometrical optics [Bor59] is Fermat's principle of least time. Let us assume a 2D (i.e., planar) medium with (possibly spacevarying) refractive index r/(x, y ) = Co/c (x, y ) , defined as the ratio of the speed Co of light in free space divided by its speed c (x, y ) in the medium. Given two points A and B in such a medium, the optical path length along a ray trajectory FAB (parameterized by/~) between points A and B is optical path length = fra~ 11 (FAB(e) ) de = CoT (FAB) ,
(10.82)
where de is the differential length element along this trajectory and T (FAB) is the time required for the light to travel this path. Fermat's principle states that light will choose a path between A and B that minimizes the optical path length. An alternative viewpoint of geometrical optics is to consider the scalar function E (x, y ) , called the eikonal, whose isolevel contours are normal to the rays. Thus, the eikonal's gradient I]~7EII is parallel to the rays. It can be shown [Bor59] using calculus of variations that Fermat's principle is equivalent to the following PDE: 2
IIVE(x, y)ll 
~
+ ff~
= O(x, y ) ,
(10.83)
called the eikonal equation. Thus, the minimal optical path length between two points located at A and B is /
E(B)  E(A) = inf 
O(FAB(e)) de.
(10.84)
FAB JFAB
Assume an optical wave propagating in a 2D medium of index 0(x, Y) at wavelengths much smaller than the image objects, so that ray optics can approximate wave optics. Then, the eikonal E of ray optics is proportional to the phase of the wavefunction. Hence, the isolevel contour lines of E are the wavefronts. Assuming that at time t = 0 there is an initial wavefront at a set of source points Si, we can trace the wavefront propagation using Huygen's envelope construction: Namely, if we dilate the points P = (x, y ) of the wavefront curve at a certain time t with circles of infinitesimal radius c(x, y ) d r , the envelope of these circles yields the wavefront at time t + dr. If T(P) is the time required for the wavefront to arrive at P from the sources, then +
9
FSip
(10.85)
iP
Thus, we can equate the eikonal E(x, y ) to the weighted distance function between a point (x, y ) and the sources along a path of minimal optical length and also view E as proportional to the wavefront arrival time T(x, y ) (see also [Bor59, Kim96, Lev70, Rou92, Ver90]). An example is shown in Fig. 10.12.
CHAPTER 10"
319
DIFFERENTIAL MORPHOLOGY
Figure 10.12: (a) Image of an optical medium consisting of two areas of different refractive indexes (whose ratio is 5/3) and the correct path of the light ray (from Snell's law) between two points. (b) Path found using the weighted distance function (numerically estimated via curve evolution); the thin light contours show the wavefronts propagating from the two source points. Many tasks for extracting information from visible images have been related to optics and wave propagation via the eikonal PDE. Its solution E(x, y) can provide shape from shading, analog contourbased halftoning, and topographic segmentation of an image by choosing the refractive index field r/(x, y ) to be an appropriate function of the image brightness [Hor86, Kim96, Naj94, Pnu94, Sch83, Ver90]. Further, in the context of curve evolution, the eikonal PDE can be seen as a stationary formulation of the embedding level function evolution PDE of Eq. (10.60) with positive speed V = fi (x, y ) =/3o / 0 (x, y ) > O. Namely, if
T(x,y) = inf{t : 4)(x,y,t) = O}
(10.86)
is the minimum time at which the zerolevel curve of qb(x, y , t) crosses (x, y ) , then it can be shown [Bar90, Fa194, Osh88]) 1
IIVTCx, y)II = ~ ( x , y )
(10.87)
Setting E =/3o T leads to the eikonal. In short, we can view the solution E(x, y ) of the eikonal as a weighted distance transform (WDT) whose values at each pixel give the minimum distance from the light sources weighted by the gray values of the refractive index field. On a computational grid this solution is numerically approximated using discrete WDTs, which can be implemented either via 2D recursive minsum difference equations or via numerical algorithms of curve evolution. The former implementation employs adaptive 2D recursive erosions and is a stationary approach to solving the eikonal, whereas the latter involves a timedependent formulation and evolves curves based on a dilationtype PDE at a speed varying according to the gray values. Next we outline these two ways of solving the eikonal PDE and discuss some of its applications.
320
PETROS MARAGOS
WDT Based on Chamfer Metrics
Let f ( i , j) >_ 1 be a sampled nonnegative graylevel image and let us view it as a discrete refractive index field. Also let S be a set of reference points or the "sources" of some wave or the location of the wavefront at time t = 0. The discrete WDT finds at each pixel P = (i, j) the smallest sum of values of f over all possible discrete paths connecting P to the sources S. It can also be viewed as a procedure for finding paths of minimal "cost" among nodes of a weighted graph or as discrete dynamic programming. It has been used extensively in image analysis problems such as minimal path finding, weighted distance propagation, and graylevel image skeletonization (for example, [Lev70, Mey92, Rut68, Ver91]. The above discrete WDT can be computed by running a 2D minsum difference equation like Eq. (10.72) that implements the chamfer distance transform of binary images but with spatially varying coefficients proportional to the gray image values [Rut68, Ver90]: Un(i,j) = m i n [ u n ( i  1 , j ) + a f ( i , j ) , U n ( i , j  1) + a f ( i , j ) , U n ( i  1 , j Un(i + 1 , j 
1) + b f ( i , j ) ,
1) + b f ( i , j ) , U n  l ( i , j ) ] ,
(10.88)
where u0 = I(S) is the 0/oo indicator function of the source set S. Starting from Uo, a sequence of functions Un is iteratively computed by running Eq. (10.88) over the image domain in a forward scan for even n, whereas for odd n an equation similar to Eq. (10.88) but with a reflected coefficient mask is r u n in a backward scan. In the limit n  oo the final WDT uoo is obtained. In practice, this limit is reached after a finite number of passes. The number of iterations required for convergence depends on both the sources and the gray values. There are also other, faster implementations using queues (see [Mey92, Ver90]). The final transform is a function of the source set S, the index field, and the norm used for horizontal distances. The above WDT based on discrete chamfer metrics is a discrete approximate solution of the eikonal PDE. The rationale for such a solution is that, away from the sources, this difference equation mapping f ~ u corresponds to V (k,e)eB
u(i,j)  u(i
k,j g) = f(i,j),
(10.89)
aij
where B is equal to the union of the output mask and its reflection and aij are the chamfer steps inside B. The left side of Eq. (10.89) is a weighted discrete approximation to the morphological derivative M (  u ) with horizontal distances weighted by aij. Thus, since in the continuous case M (  u ) = 11~Tull, Eq. (10.89) is an approximation of the eikonal. In fact, as established elsewhere [Mey92], it is possible to recover a digital image u from its half morphological gradient u  u o B using the discrete weighted distance transform if one uses 1pixel sources in each regional minimum of u.
CHAPTER 10:
DIFFERENTIAL MORPHOLOGY
321
The constants a and b in Eq. (10.88) are the distance steps by which the planar chamfer distances are propagated within a 3x3 neighborhood. The propagation of the local distances (a, b) starts at the points of sources S and moves with speed V =/3 (i, j) = fi0 / f (i, j). If f is a binary image, then the propagation speed is constant and the solution of the above difference equation (after convergence) yields the discrete chamfer distance transform of S. To improve the WDT approximation to the eikonal's solution, one can optimize (a, b) so that the error is minimized between the planar chamfer distances and the Euclidean distances. Using a neighborhood larger than 5 • 5 can further reduce the approximation error but at the cost of an even slower implementation. However, larger neighborhood masks cannot be used with WDTs because they give erroneous results since the large masks can bridge over a thin line that separates two segmentation regions. Overall, this chamfer metric approach to WDT is fast and easy to implement but due to the required small neighborhoods is not isotropic and cannot achieve high accuracy. WDT Based on Curve Evolution
In this approach, at time t = 0 the boundary of each source is modeled as a curve y (0), which is then propagated with normal speed V = fi (x, Y) = fi0 / 17(x, Y). The propagating curve y(t) is embedded as the zerolevel curve of a function qb(x, y , t), where qb(x, y , 0) = qb0 (x, y ) is the signed (positive in the curve interior) distance from y (0). The function qb evolves according to the PDE qbt =
l~(x,y)llXY4~ll,
(10.90)
which corresponds to curve evolution in a heterogeneous medium with positiondependent speed fl > 0, or equivalently to a successive front dilation by disks with positionvarying radii fi(x, y)dt. This is a timedependent formulation of the eikonal PDE [Fa194, Osh88]. It can be solved via Osher and Sethian's numerical algorithm given by Eq. (10.49). The value of the resulting WDT at any pixel (x, y ) of the image is the time it takes for the evolving curve to reach this pixel, that is, the smallest t such that qb(x, y , t) > 0. This continuous approach to the WDT can achieve subpixel accuracy, as investigated by Kimmel et al. [Kim96]. In the applications of the eikonal PDE examined herein, the global speed function fi(x, y) is everywhere nonnegative. In such cases the computational complexity of Osher and Sethian's level set algorithm (which can handle sign changes in the speed function) can be significantly reduced by using Sethian's fast marching algorithm [Set96], which is designed to solve the corresponding stationary formulation of the eikonal PDE, that is, ]1~TT]] = l/ft. There are also other types of numerical algorithms for solving stationary eikonal PDEs; for example, Rouy and Tourin [Rou92] have proposed an efficient iterative algorithm for solving II ~TEII = 0. All of the numerical image experiments with curve evolution shown herein were produced using an implementation of fast marching based on a simple data structure of two queues. This data structure is explained in [Mar00] and has been used to implement WDTs based on either chamfer metrics or fast marching
322
PETROS MARAGOS
Figure 10.13: Eikonal halftoning of the Cameraman image I from the weighted distance transform of the "negative" image max(I)  I. Top row: the light source was at the top left corner, and the WDTs (displayed as intensities modulo a height such that 25 waves exist per image) were obtained via (a) a (1,1) chamfer metric, (b) the optimal 3x3 chamfer metric, and (c) curve evolution. Bottom row: 100 contour lines of the WDTs in the top row give gridless halftoning of the original images. for applications b o t h with single sources as well as with multiple sources, where triple points develop at the collision of several wavefronts.
Gridless Halftoning via the Eikonal PDE Inspired by the use in SchrOder [Sch83] of the eikonal function's contour lines for visually perceiving an intensity image I(x, y), Verbeek and Verwer [Ver90] and especially Pnueli and Bruckstein [Pnu94] attempted to solve the PDE
IIVE(x,y)II = constant  I(x,y)
(10.91)
and create a binary gridless halftone version of I ( x , y ) as the union of the level curves of the eikonal function E(x, y). The larger the intensity value I (x, y ) , the smaller the local density of these contour lines in the vicinity of (x, y ) . This eikonal PDE approach to gridless halftoning, which we call eikonal halftoning, is indeed very promising and can simulate various artistic effects, as shown in Fig. 10.13, which also shows that the curve evolution WDT gives a s m o o t h e r halftoning of the image than the WDTs based on chamfer metrics.
CHAPTER 10:
DIFFERENTIAL MORPHOLOGY
323
Watershed Segmentation via the Eikonal A powerful morphological approach to image segmentation is the watershed algorithm [Mey90, Vin91b], which transforms an image f(x, y) to the crest lines separating adjacent catchment basins that surround regional minima or other "marker" sets of feature points. Najman and Schmitt [Naj94] established that (in the continuous domain and assuming that the image is smooth and has isolated critical points) the continuous watershed is equivalent to finding a skeleton by influence zones with respect to a weighted distance function that uses points in the regional minima of the image as sources and IIv fll as the field of indices. A similar result was obtained by Meyer [Mey94] for digital images. In Maragos and Butt [Mar00] the eikonal PDE modeling the watershed segmentation of an imagerelated function f was solved by finding a WDT via the curve evolution PDE of Eq. (10.90) in which the speed/~ is proportional to 1/[] V f I]. Further, the results of this new segmentation approach [Mar00] have been compared to the digital watershed algorithm via flooding [Vin9 l b] and to the eikonal approach solved via a discrete WDT based on chamfer metrics [Mey94, Ver90]. In all three approaches, robust features are extracted first as markers of the regions, and the original image I is transformed to another function f by smoothing via alternating openingclosing by reconstruction, taking the gradient magnitude of the filtered image, and changing (via morphological reconstruction) the homotropy of the gradient image so that its only minima occur at the markers. The segmentation is done on the final outcome f of the above processing. In the standard digital watershed algorithm via flooding [Mey90, Vin91b], the flooding at each level is achieved by a planar distance propagation that uses the chessboard metric. This kind of distance propagation is not isotropic and could give wrong results, particularly for images with large plateaus. Eikonal segmentation using WDTs based on chamfer metrics improves this situation a little but not entirely. In contrast, for images with large plateaus or regions, segmentation via the eikonal PDE and curve evolution WDT gives results close to ideal. Using a test image that was difficult (because expanding wavefronts meet watershed lines at many angles ranging from being perpendicular to almost parallel), Fig. 10.14 shows that the continuous segmentation approach based on the eikonal PDE and curve evolution outperforms the discrete segmentation results (using either the digital watershed flooding algorithm or chamfer metric WDTs). In real images, which may contain a large variety of region sizes and shapes, the digital watershed flooding algorithm may give results comparable to the eikonal PDE approach. Details on comparing the two segmentation approaches can be found in [Mar00].
10.7
Conclusions
We have provided a unified view and some analytic tools for a recently growing part of morphological image processing that is based on ideas from differential calculus and dynamic systems, including the use of partial differential equations
324
PETROS MARAGOS
Figure 10.14: Performance of various segmentation algorithms on a test image, 250x400 pixels that is the minimum of two potential functions. Its contour plot (thin bright curves) is superimposed on all segmentation results. Markers are the two source points of the potential functions. The segmentation results are from (a) the digital watershed flooding algorithm and from WDTs based on (b) the optimal 3x3 chamfer metric, (c) the optimal 5x5 chamfer metric, and (d) curve evolution. (The thick bright curve shows the correct segmentation.) or difference equations to model nonlinear multiscale analysis or distance propagation in images. We have discussed general 2D nonlinear difference equations of the maxsum or minsum type that model the space dynamics of 2D morphological systems (including the distance computations) and some nonlinear signal transforms, called slope transforms, that can analyze these systems in a transform domain in ways conceptually similar to the application of Fourier transforms to linear systems. We have used these nonlinear difference equations to model discrete distance transforms and relate them to numerical solutions of the eikonal PDE of optics. In this context, distance transforms are shown to be bandpass slope filters. We have also reviewed some nonlinear PDEs that model the evolution of multiscale morphological operators and use morphological derivatives. Related to these morphological PDEs is the area of curve evolution, which employs methods of differential geometry to study the differential equations governing the propagation of timeevolving curves. The PDEs governing multiscale morphology and most cases of curve evolution are of the HamiltonJacobi type and are related to the eikonal PDE of optics. We view the analysis of the multiscale morphological PDEs and of the eikonal PDE solved via weighted distance tranforms as a unified area in nonlinear image processing that we call differential morphology, and we have briefly discussed some of its potential applications to image processing.
CHAPTER 10:
DIFFERENTIAL MORPHOLOGY
32 5
References [Alv92] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel. Axiomatization et nouveaux operateurs de la morphologie mathematique. C. R. Acad. ScL Paris 315, Series I, 265268 (1992). [Alv93] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel. Axioms and fundamental equations of image processing. Archiv. Rat. Mech. 123(3), 199257 (1993). [Are93] A. Arehart, L. Vincent and B. Kimia. Mathematical morphology: the HamiltonJacobi connection. In Proc. Intl. Conf. on Computer Vision, pp. 215219(1993). [Bar90] M. Bardi and M. Falcone. An approximation scheme for the minimum time function. SIAM J. Control Optimization 28, 950965 (1990). [Blu73] H. Blum. Biological shape and visual science (part I). J. Theoret. BioL 38, 205287 (1973). [Boo92] R. van den Boomgaard. Mathematical morphology: extensions towards computer vision. Ph.D. thesis, Univ. of Amsterdam, The Netherlands (1992). [Boo94] R. van den Boomgaard and A. Smeulders. The morphological structure of images: the differential equations of morphological scalespace. IEEE Trans. Pattern Anal Mach. Intellig. 16, 11011113 (November 1994). [Bor59] M. Born and E. Wolf. Principles of Optics. Pergamon Press, Oxford, England (1959; 1987 edition). [Bor86] G. Borgefors. Distance transformations in digital images. Comput. Vision Graph. Image Process. 34, 344371 (1986). [Bro92] R. W. Brockett and P. Maragos. Evolution equations for continuousscale morphology. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (San Francisco, CA, March 1992). [Bro94] R. Brockett and P. Maragos. Evolution equations for continuousscale morphological filtering. IEEE Trans. Signal Process. 42, 33773386 (December 1994). [But98] M.A. Butt and P. Maragos. Optimal design of chamfer distance transforms. IEEE Trans. Image Process. 7, 14771484 (October 1998). [Che89] M. Chen and P. Yan. A multiscaling approach based on morphological filtering. IEEE Trans. Pattern Anal Machine IntelL 11,694700 (July 1989). [Cou62] R. Courant and D. Hilbert. Methods of Mathematical Physics, Wiley, New York (1962).
326
PETROS MARAGOS
[Cra92] M. G. Cranda11, H. Ishii, and P.L. Lions. User's guide to viscosity solutions of second order partial differential equations. Bull Amer. Math. Soc. 27,166 (July 1992). [Dan80] P.E. Danielsson, Euclidean distance mapping. Comp. Graph. Image Process. 14, 227248 (1980). [Dor94] L. Dorst and R. van den Boomgaard. Morphological signal processing and the slope transform. Signal Process. 38, 7998 (July 1994). [Dud84] D. E. Dudgeon and R. M. Mersereau. Multidimensional Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ (1984). [Fa194] M. Falcone, T. Giorgi, and P. Loreti. Level sets of viscocity solutions: some applications to fronts and rendezvous problems. SIAM J. AppL Math. 54, 13351354 (October 1994). [Hei94] H. J. A. M. Heijmans. Morphological Image Operators, Academic Press, Boston (1994). [Hei9 7] H.J.A.M. Heijmans and P. Maragos. Lattice calculus and the morphological slope transform. Signal Process. 59, 1742 (1997). [Hor86] B. K. P. Horn. Robot Vision. MIT Press, Cambridge, MA (1986). [Hua94] C. T. Huang and O. R. Mitchell. A euclidean distance transform using grayscale morphology decomposition. IEEE Trans. Pattern Anal Machine Intell. 16, 443448 (April 1994). [Kim90] B. Kimia, A. Tannenbaum, and S. Zucker. Toward a computational theory of shape: an overview. In Proc. European Conf. on Computer Vision (France, April 1990). [Kim96] R. Kimmel, N. Kiryati, and A. M. Bruckstein. Subpixel distance maps and weighted distance transforms. J. Mathemat. Imaging Vision 6, 223233 (1996). [Koe84] J. J. Koenderink. The structure of images. Biolog. Cybern. 50, 363370 (1984). [Lax73] P. D. Lax. Hyperbolic Systems of Conservation Laws and the Mathematical Theory of Schock Waves. SIAM, Philadelphia (1973). [Lev70] G. Levi and U. Montanari. A greyweighted skeleton. Inform. Control 17, 6291 (1970). [Mar89] P. Maragos. Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Anal Machine IntelL 11,701716 (July 1989). [Mar94] P. Maragos. Morphological systems: slope transforms and maxmin difference and differential equations. Signal Process. 38, 5777 (July 1994).
CHAPTER 10"
DIFFERENTIAL MORPHOLOGY
327
[Mar96] P. Maragos. Differential morphology and image processing. IEEE Trans. Image Process. 78, 922937 (June 1996). [Mar98] P. Maragos. Morphological signal and image processing. In The Digital Signal Processing Handbook (V. Madisetti and D. Williams, eds.) CRC Press, Boca Raton, FL (1998). [Mar00] P. Maragos and M. A. Butt. Curve evolution, differential morphology, and distance transforms applied to multiscale and eikonal problems. Fundamenta Informaticae 41, 91129 (2000). [Mar99] P. Maragos and F. Meyer. Nonlinear PDEs and numerical algorithms for modeling levelings and reconstruction filters. In ScaleSpace Theories in Computer Vision (Proc. Intl. Conf. ScaleSpace'99) pp. 363374. Lecture Notes in Computer Science Vol. 1682. SpringerVerlag (1999). [Mar90] P. Maragos and R. W. Schafer. Morphological systems for multidimensional signal processing. Proc. IEEE 78, 690710 (April 1990). [Mar82] D. Marr. Vision. Freeman, San Francisco (1982). [Mat 75 ] G. Matheron. Random Sets and Integral Geometry. Wiley, New York (1975). [Mat93] J. Mattioli. Differential relations of morphological operators. In Proc. Intl. Workshop on Mathematical Morphology and Its Application to Signal Processing (J. Serra and P. Salembier, eds.). Univ. Polit. Catalunya, Barcelona, Spain (1993). [Mey92] F. Meyer. Integrals and gradients of images. In Image Algebra and Morphological Image Processing III. Proc. SPIE Vol. 1769, pp. 200211 (1992). [Mey94] F. Meyer. Topographic distance and watershed lines. Signal Process. 38, 113125 (July 1994). [Mey98] F. Meyer. The levelings. In Mathematical Morphology and Its Applications to Image and Signal Processing (H. Heijmans and J. Roerdink, eds.), pp. 199206, Kluwer Acad. Publ. (1998). [Mey90] F. Meyer and S. Beucher. Morphological segmentation. J. Visual Commun. Image Representation 1(1), 2145 (1990). [Nac96] P. F. M. Nacken. Chamfer metrics, the medial axis and mathematical morphology. J. Mathemat. Imaging Vision 6, 235248 (1996). [Naj94] L. Najman and M. Schmitt. Watershed of a continuous function. Signal Process. 38, 99112 (July 1994). [Osh88] S. Osher and J. Sethian. Fronts propagating with curvaturedependent ppeed: Algorithms based on HamfltonJacobi formulations. J. Comput. Phys. 79, 1249 (1988).
328
PETROS MARAGOS
[Pnu94] Y. Pnueli and A. M. Bruckstein. Digimtirera digital engraving system. Visual Comput. 10 277292 (1994). [Pre93] F. Preteux. On a distance function approach for graylevel mathematical morphology. In Mathematical Morphology in Image Processing (E.R. Dougherty, ed.) Marcel Dekker, New York (1993). [Roc70] R. T. Rockafellar. Convex Analysis. Princeton Univ. Press (1970). [Ros66] A. Rosenfeld and J. L. Pfaltz. Sequential operations in digital picture processing. J. ACM 13,471494 (October 1966). [Ros68] A. Rosenfeld and J. L. Pfaltz. Distance functions on digital pictures. Pattern Recog. 1, 3361 (1968). [Rou92] E. Rouy and A. Tourin. A viscocity solutions approach to shape from shading. SIAMJ. Numer. Anal 29(3), 867884 (June 1992). [Rut68] D. Rutovitz. Data structures for operations on digital images. In Pictorial Pattern Recognition (G.C. Cheng et al. eds.), pp. 105133. Thompson, Washington, DC (1968). [Sal95] P. Salembier and J. Serra. Flat zones filtering, conencted operators, and filters by reconstruction. IEEE Trans. Image Process. 4, 11531160 (August 1995). [Sap93] G. Sapiro, R. Kimmel, D. Shaked, B. Kimia, and A. Bruckstein. Implementing continuousscale morphology via curve evolution. Pattern Recog. 26(9), 13631372 (1993). [Sch83] M. SchrOder. The eikonal equation. Math. Intelligencer 1, 3637 (1983). [Ser82] J. Serra. Image Analysis and Mathematical Morphology, Academic Press, New York (1982). [Ser88] J. Serra (ed.). Image Analysis and Mathematical Morphology, Vol. 2. Academic Press, New York (1988). [Set96] J. A. Sethian. Level Set Methods. Cambridge Univ. Press (1996). [Shi91] F.Y.C. Shih and O. R. Mitchell. Decomposition of grayscale morphological structuring elements. Pattern Recog. 24, 195203 (1991). [Ste80] S. R. Sternberg. Language and Architecture for Parallel Image Processing. In Pattern Recognition in Practice (E. S. Gelsema and L. N. Kanal, eds.). North Holland, New York (1980). [Tek98] H. Tek and B. B. Kimia. Curve evolution, wave propagation, and mathematical morphology. In Mathematical Morphology and Its Applications to Image and Signal Processing (H. Heijmans and J. Roerdink, eds.), pp. 115126, Kluwer, Boston (1998).
CHAPTER 10:
DIFFERENTIAL MORPHOLOGY
329
[Ver90] P. Verbeek and B. Verwer. Shading from shape, the eikonal equation solved by greyweighted distance transform. Pattern Recog. Lett. 11,618690 (1990). [Ver91] B. J. H. Verwer. Distance transforms: metrics, algorithms, and applications. Ph.D. thesis, Tech. Univ. of Delft, The Netherlands (1991). [Vin9 la] L. Vincent. Exact Euclidean distance function by chain propagations. In Proc. Conf. on Computer Vision and Pattern Recognition, pp. 520 525 (1991). [Vin9 lb] L. Vincent and P. Soille. Watershed in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern A n a l Machine IntelL 13, 583598 (June 1991). [Vin93] L. Vincent. Morphological grayscale reconstruction in image analysis: applications and efficient algorithms. IEEE Trans. Image Process. 2, 176201 (April 1993). [Wit83] A. P. Witkin. Scalespace filtering. Proc. IntL Joint Conf. on Artificial Intelligence (Karlsruhe, Germany, 1983).
This Page Intentionally Left Blank
Coordinate Logic Filters: Theory and Applications in
Image Analysis
BASIL G. MERTZIOS AND KONSTANTINOS D. TSIRIKOLIAS Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi, Greece
1 1.1
Introduction
Logic operations have been used successfully for a variety of image processing tasks with binary images, such as removal of isolated points that usually represent noise, separation of multiple objects, extraction of depth maps and skeletons, shape smoothing, coding, compression, region filling, shape recognition and restoration of shapes [Jam87]. Coordinate logic operations (CLOs) are logic operations (AND, OR, NOT, and XOR and their combinations) among the corresponding binary values of two or more signals or image pixels. In this chapter we present the coordinate logic (CL) filters [Mer93, Mer98, Tsi91, Tsi93], which constitute a novel class of nonlinear digital filters that are based on the execution of CLOs among the pixels of the image [Die71]. CL filters can execute the four basic morphological operations (erosion, dilation, opening, and closing), as well as the key issue of the successive filtering and managing of the residues. Therefore, CL filters are suitable to perform the range of tasks and applications that are executed by morphological filters, achieving the same functionality. These applications include multidimensional filtering (noise 331
332
BASIL G. MERTZIOS AND KONSTANTINOS D. TSIRIKOLIAS
removal, lowpass and highpass filtering, image magnification), edge extraction and enhancement, shape description and recognition, region filling, shape smoothing, skeletonization, feature extraction using the pattern spectrum, multiscale shape representation and multiresolution approaches [Mar89], coding, fractal modeling, and special video effects. However, due to their different definition, in practice, CL filters display a slightly different response to graylevel images. In the case of binary signals or images, CL filters coincide with the conventional morphological filters. Since CL filters use only CLOs, they are simple, fast, and very efficient in various 1D, 2D, or higherdimensional digital image analysis and pattern recognition applications. The CLOs CAND (coordinate AND)and COR (coordinate OR)are analogous to the operations MIN and MAX, respectively [Nak78], while CXOR (coordinate XOR) does not correspond to any morphological operation. In graylevel images the use of CXOR operation provides efficient edge detection and enhancement since there is no need to compute the difference between the original and filtered images, as is required by morphological filters. The desired processing in CL filtering is achieved by executing only direct logic operations among the pixels of the given image. Considering the underlying concept of CLOs, the CL filtering may be seen as independent parallel processing of the bit planes that are produced from the decomposition of the given image. Therefore, they are characterized by inherent parallelism and are appropriate for highspeed realtime applications. CL filters can also be used to execute tasks in image processing that cannot be executed by morphological filters, such as implementing of an approximation of the majority function [Ahm89] by the majority CL filter, fractal transformations and modeling, and design of cellular automata. In fact, a remarkable property of the CL filters is their direct relation with known fractal structures. For example, the CAND and COR operators are generators of Sierpinsky structures. Using CL filters, very simple algorithms for fractal transformation and cellular automata may be designed [Bar88, Mer98], that may be implemented simply using special purpose hardware and architectures based on logic circuits. CL filters satisfy the defining property of idempotency of morphological filters [Mar87, Mar89, Mar90, Nak78, Ser83, Ser92] for opening and closing and their combinations. However, they are nonincreasing filters since the second defining property of increasingness that characterizes the morphological filters is not generaUy satisfied. In fact, the CL filters satisfy the defining properties of idempotency, duality, and extensivity of morphological filters [Mar90, Par96, Ser83, Ser92] for opening and closing as well as for their combinations. The fundamental operations of dilation and erosion that correspond to topological max and min operations on pixel neighborhoods are special cases of the r t h ranked order filters and imply a sorting procedure. From this point of view, the morphological operations that in general use combinations of dilation and erosion, overlap with the orderstatistic (OS) filters [Mar87] or even may be considered as a class of OS filters [Kim86]. In contrast, CL filters do not involve any kind of
CHAPTER 11:
COORDINATE LOGIC FILTERS
333
sorting, and their CLOs of dilation and erosion result in signal values that may not be included in the initial set of input signal values; therefore, CL filters do not overlap with any class of OS filters. A methodology has been developed for implementing morphological filters using CLOs. If the image gray levels are mapped to an appropriately selected set of decimal numbers, the CLOs of erosion and dilation act on this specific set exactly as the MIN and MAX operators of morphological filtering do, using very simple hardware structures. The only drawback of this approach is the large binary word lengths required for the assignment of the image gray quantization levels to this specific set of numbers, which is alleviated by exploiting the properties of CLOs and developing decomposition techniques that operate with smaller binary lengths, and with less demanding hardware structures. This approach may be extended for the implementation of any nonlinear filter that falls in the general class of OS filters, using CLOs. A class of nonlinear filters that is based on Boolean operators is that of the generalized stack (GS) filters [Lin90, Mar87]. The difference between GS and CL filters is that the stack filters operate on signal levels, whereas CL filters operate on binary representations. The definition of GS filters is based on threshold decomposition and Boolean operators. Threshold decomposition maps the 2nlevel input signal (where n is the word length) into 2 n  i binary signals by thresholding the original signal at each of the allowable levels. The set of 2 n  1 signals is then filtered by 2 n  1 Boolean operators that are constrained to have the stacking property. The multilevel output is finally obtained as the sum of the 2 n  i binary output signals. In contrast, CL filters decompose the signal into n binary signals that operate in parallel and achieve the desired processing by executing only direct logic operations among the binary values of the given signal. The CL operators are defined and their properties are derived in Sec. 11.2, and Sec. 11.3 presents the basic CL filters that execute the dilation, erosion, opening, and closing operations as well as their filter structures. Section 11.4 reviews the properties of duality for dilationerosion and openingclosing, of idempotency for opening and closing, and of extensivity for the dilation, erosion, opening, and closing of CL filters. Section 11.5 introduces the proposed scheme for performing morphological operations using CLOs and presents a simple hardware implementation of the basic MIN and MAX operations using this scheme. Section 11.6 covers image analysis and feature extraction applications, including edge extraction, calculation of the pattern spectrum, noise removal, fractal transformation, and the design of cellular automata. Concluding remarks are given in Sec. 11.7.
1 1.2 Coordinate Logic Operations on Digital Signals The underlying idea in coordinate logic image processing is the execution of CLOs among graylevel pixels. These operations are executed among the corresponding binary bits of equal position of the considered pixels, without counting the carry
334
BASIL G. MERTZIOS AND KONSTANTINOS D. TSIRIKOLIAS
bits. The fundamental properties of logic operations also hold for CLOs since they constitute an extension of Boolean algebra [Die71]. For our discussion, we assume that A and B are two decimal numbers with n bits in their binary representations, given by A = [ax,a2,...,an],
B = [bl,b2,...,bn].
The following definitions refer to the coordinate operators that are used by CL filters. Definition 11.1. The coordinate logic operation 9 of two numbers A and B in the decimal system, in their binary sequence, is given by C = [c1,c2,...,Cn]
= A 9B,
(11.1)
where ~ denotes the CL operation corresponding to the logic operation o. The ith bit ci is defined as the output of the logic operation o among the corresponding ith bits ai and bi of the operands, that is, ci = aiobi,
i = 1, 2 , . . . , n.
(11.2)
The operation o may be the logical OR, AND, or NOT, or a function of them. A distinct characteristic of the coordinate operators is that they do not encounter any carry bits as occurs in regular logical operations among binary sequences. Definition 11.2. The coordinate logic operation CNOT on a number A results in a decimal number C with an nbit representation: C=
[CLC2 . . . . . Cn] = CNOT A,
(11.3)
where the ith bit ci is the logical NOT of the corresponding ai bit of the number A, that is, ci = NOT ai, i = 1, 2 , . . . , n. (11.4) It is seen that the CAND and COR operations result in values that are not included in the initial range of the values of the pixels. This characteristic does not constitute a problem in the proposed image processing feature extraction procedure since after each successive filtering, only the population of the remaining pixels is important, not their graylevel values. However, for the other image processing tasks presented in this chapter, this characteristic is either ignored by the combinations of CL operators or has a beneficial effect (e.g., in fractal transformation). Let A and B be two integer positive decimal numbers and let the CAND and COR operations on the numbers A and B result in the two decimal numbers E = A CANDB = [el, e2,..., en],
(11.5)
D = A CORB = [ d l , d 2 , . . . , dn].
(11.6)
CHAPTER 11:
335
COORDINATE LOGIC FILTERS
Table 11.1: Fundamental Properties of Coordinate Logic Operations.
Fundamental laws
Commutative laws Associative laws Distributive laws
0 CAND A = 0, 0 COR A = A A CAND A = A, A COR A = A (idempotence laws) (2 n  1) CAND A = A, (2 n  1) COR A = (2 n  1) A CAND CNOT ( A ) = 0 , A COR CNOT ( A ) = (2 n  l ) A CAND B = B CAND A, A C O R B = B CORA A CAND (B CAND C ) = (A CAND B) CAND C A COR (B COR C ) = (A COR B) COR C CAND (A, ( COR (B, C))) = COR ( CAND (A,B), CAND (A, C)) CAND (A,(CAND ( B , C ) ) ) = CAND (CAND ( A , B ) , C ) COR (A, (CAND (B, C))) = CAND (COR (A,B), COR (A, C))
COR (A, (COR (B,C)))= COR (COR (A,B),C)
De Morgan's laws
Absorption laws
CNOT (A1 CAND A2 CAND . . . CAND An) = CNOT (A1) COR CNOT ( A 2 ) . . . COR CNOT
(An)
CNOT (A1 COR A2 COR ... CAND An) = CNOT (A1) CAND CNOT (A2)... CAND CNOT (An). A COR (B CAND A ) = A, A CAND (B COR A ) = A (A COR B) CAND (A CORC)= A COR B CAND C
The f u n d a m e n t a l p r o p e r t i e s of logic o p e r a t i o n s also h o l d for CLOs since t h e y constitute a n e x t e n s i o n of B o o l e a n a l g e b r a [Die71]. T h e s e p r o p e r t i e s are s u m m a r i z e d in Table 11.1. T h e o r e m s 11.1 a n d 11.2 b e l o w p r o v i d e u s e f u l r e l a t i o n s a m o n g the d e c i m a l n u m b e r s A,B a n d t h e d e c i m a l n u m b e r s D,E o b t a i n e d f r o m A,B u s i n g t h e CL o p e r a t i o n s g i v e n in Eqs. (11.5) a n d (11.6). Theorem
11.1. Let E and D be decimal numbers as defined by Eqs. (11.5) and
(11.6), respectively. Then 0 < E < m i n ( A , B), m a x ( A , B ) < D < (2 n 
(11.7) 1),
(11.8)
where n is the word length. In other words, the COR and CAND of A and B represent a measure of the maximum and of the minimum functions of A and B, respectively. Proof. T h e p r o o f r e s u l t s d i r e c t l y b y t a k i n g into a c c o u n t the d e f i n i t i o n s of AND a n d OR, a n d f r o m the fact t h a t n o c a r r y bits are e n c o u n t e r e d in t h e CLOs. Note t h a t (2 n  1) is the m a x i m u m n u m b e r r e p r e s e n t e d b y a w o r d l e n g t h of n bits. T h e o r e m 11.1 c a n b e e x t e n d e d to t h e case of m o r e t h a n t w o n u m b e r s . E] 11.2. Let E and D be decimal numbers as defined by Eqs. (11.5) and (11.6), respectively. The sum of E and D is equal to the sum of A and B [Mer98], that is,
Theorem
E + D = (A COR B) + (A CAND B) = A + B
(11.9)
336
BASIL G. MERTZIOS AND KONSTANTINOS D. TSIRIKOLIAS
T h e o r e m 11.3. Let A and B be decimal numbers with A < B. Then it holds n
n
E = ~" ei2 i = A CAND B = A 
~. 6 ( a i 
i=1
1)6(bi)2 i < A,
(11.10a)
i=1
n
n
D = ~. di2 i = A COR B = B + ~. 6 ( a i i=1
1)6(bi)2 i > B,
(ll.10b)
i=1
where 6 (.) is the delta or Dirac function. Proof. For a n y bit p o s i t i o n i of A, B for w h i c h ai, bi have d i f f e r e n t b i n a r y values, it h o l d s t h a t a~ 9 bi = 1 (where ~ d e n o t e s XOR), or e q u i v a l e n t l y 6(ai  1)(5(b~) = 1. However, if h o w e v e r a~, b~ have the s a m e b i n a r y values t h e n 6(a~  1)6(b~) = 0. Hence, ei = a i  6 ( a i  1 ) 6 ( b i ) a n d di = bi + 6 ( a i  1 ) 6 ( b i ) a n d r e s u l t i n g in Eqs. (11.10). [i1 C o r o l l a r y 11.1. It follows from Theorem 11.2 that n
A  E = D  B = K = ~ 6 ( a i  1 ) 6 ( b i ) 2 i,
(11.11)
i=1
which m e a n s that COR and CAND differ from the m i n and m a x operations by an equal quantity K. Therefore, the application of COR and CAND instead of the rain a n d m a x operations, respectively, to two numbers attenuates or amplifies them by the same quantity K. T h e o r e m 11.4. Let A and B be decimal numbers, with A < B. Then E = A CAND B = m i n ( A , B ) = A,
(11.12a)
D = ACOR B = m a x ( A , B ) = B,
(11.12b)
holds if and only if for each ai = 1 the corresponding bi = 1, or equivalently = 1.
6(ai1)6(bi1)
Proof. If bi = 1 for aU bit p o s i t i o n s at w h i c h ai = 1, t h e n di = 1 a n d ei = 1. If a i=Oandbi= 1, t h e n d i = landei=0. Ifai=0andbi=0, thendi=0and ei = 0. T h e r e f o r e , it r e s u l t s t h a t in all cases we have di = bi a n d ei = ai. n T h e o r e m 11.5. Let Ai, i = 1, 2 , . . . , N, be decimal numbers. Then 0 < E = A1 C A N D A 2 . . . CAND AN = C A N D { A i , i = 1, 2 , . . . , N} < m i n { A i }, (11.13a) m a x { A i } < D = A1COR A 2 . . . COR AN = C O R { A i , i = 1 , 2 , . . . , N } < (2 n  1). (11.13b) In Eq. ( l l . 1 3 a ) note t h a t E = min{Ai} if a n d only if all Ai have o n e s at all the bits w h e r e min{Ai} h a s ones, while in Eq. (11.13b) D = 2 n  1 if a n d only if all Ai have z e r o s at all the bits w h e r e m a x { A i } h a s zeros.
CHAPTER 11:
337
COORDINATE LOGIC FILTERS
Proof. It can be p r o v e d easily by e x t e n d i n g T h e o r e m s 11.1 a n d 11.4 in the case of CLOs a m o n g m o r e t h a n two n u m b e r s . E3 C o r o l l a r y 11.2. Let A a n d B be decimal n u m b e r s with A < B. I f for each ai = 1 the corresponding bi = O, that is, if ~ (ai) ~ (bi  1 ) = 1, then
E = A CAND B = 0,
D = ACOR B = A + B < 2 n  1.
(11.14)
Proof. If bi = 0 for all bit p o s i t i o n s at w h i c h ai = 1, t h e n di = 0, a n d t h e r e f o r e E = 0. Then, u s i n g T h e o r e m s 11.2 a n d 11.1, it follows t h a t D = A + B < 2 n  1. [] C o r o l l a r y 11.3. Let A a n d B be decimal n u m b e r s with E = A CAND B a n d D = A CORB. Then the following propositions hold:
1. I f E > O, then D < A + B. 2. I f the s u m m a t i o n o f A a n d B is A + B > 2 n  1, then E > O. Proof.
1. It follows f r o m E q . ( 1 1 . 9 ) t h a t A + B  D
= E > 0, t h a t is, t h a t D < A + B .
2. If A + B > 2 n  1, f r o m Eq. (11.9) it follows t h a t E = A + B  D > 2 n  1  D > 0. D
1 1.3
Derivation of the Coordinate Logic Filters
We d e n o t e a t w o  d i m e n s i o n a l digital signal by a 2D set G = { g ( i , j ) , i = 1 , 2 , . . . , M, j = 1, 2 , . . . , N}, w h e r e M a n d N are the finite d i m e n s i o n s of the signal in the h o r i z o n t a l a n d vertical directions, respectively. The c o r r e s p o n d i n g filtered o u t p u t signal is F=
{f(i,j),
i= 1,2,...,M,
j=
1,2,...,N}.
(11.15)
In w h a t follows, for simplicity we d e n o t e the i n p u t a n d o u t p u t images as G a n d F, respectively. At this p o i n t we derive four basic CLOs: dilation, erosion, opening, a n d closing. First, we d e c o m p o s e the given graylevel image G into a set of b i n a r y images Sk = {Sk(i,j); i = 1, 2 , . . . , M , j = 1, 2 , . . . , N } , k = 0 , 1 , . . . , n 1, a c c o r d i n g to the d e c o m p o s i t i o n of the (i, j ) pixel as follows: n1
g ( i , j ) = ~. S k ( i , j ) 2 k,
i=1,2,...,M,
j=I,2,...,N,
(11.16)
k=0
w h e r e sk(i, j ) , k = 0, 1 , . . . , n  1 are the b i n a r y c o m p o n e n t s of the d e c i m a l pixel values g ( i , j ) , i = 1 , 2 , . . . , M , j = 1 , 2 , . . . , N .
338
1 1.3.1
BASIL G. MERTZIOS AND KONSTANTINOS D. TSIRIKOLIAS
Coordinate Logic Dilation
CL dilation of the image G by the structuring element B is denoted by GD (g (i, j) ), or GD for simplicity, and is defined by F = GD =
CORg(i,j)
E B
n1
= ~. (Sk(i,j))uD 2 k
i=1,2,...,M,
j=I,2,...,N,
(11.17)
k=0
where (Sk (i, j) )D denotes the dilation operation on the binary value Sk (i, j) by the structuring element B, given by (Sk (i, j) )D = OR (Sk (i, j) ) ~ B. Moreover, note that f ( i , j), i = 1, 2, .... M, j = 1, 2 , . . . , N, is in the range ( m a x g ( i , j ) ~ B) _< f ( i , j ) _< 2 n  1.
11.3.2
(11.18)
Coordinate Logic Erosion
CL erosion of the image G by the structuring element B is denoted by GE and is defined by n1
F = G E = C A N D g ( i , j ) E B = ~. (Sk(i,j))E 2 k,
i = 1,2,...,M,
k=0
j = 1,2,...,N,
(11.19)
where (Sk (i, j) )E denotes the erosion operation on the binary value Sk (i, j) by the structuring element B, given by (Sk (i, j) )E = AND (Sk (i, j) ) ~ B. Moreover, note that f (i, j), i = 1, 2 , . . . , M, j = 1, 2 .... , N, is in the range
0 02, otherwise,
(12.18)
where 0 2 (n) is the variance of the data in the window and 02 is the variance of the noise. In uniform or gradually varying regions of the image, 0(n) is similar to oG, SO K is low. Conversely, in the presence of edges, 0 (n) is larger than 08, so K is high. Notice that the above considerations have been expressed in the form of fuzzy reasoning. Another useful local feature deals with the difference between a pixel's luminance and the median Xmed of the pixel values in the window: Ei(n) =
Ixi (n)  Xmed (n) l
0G
.
(12.19)
Finally, a third local feature is represented by the normalized distance Di(n) between any pixel and the central one. The goal is to reduce the importance of pixels that are far from the central element of the window. The local features mentioned can be used as the inputs of a fuzzy system that processes them nonlinearly by means of fuzzy rules. A typical rule can be expressed as follows: IF K is large AND Ei is small AND D i is small, THEN w i is large. Good results have been obtained using a 27rule fuzzy system for Gaussian noise cancellation [Tag95a]. The qth rule in this system (q = 1, 2 , . . . , 27) is formally defined as follows: Rule q :
IF (K, FI,q) AND (Ei, F2,q) AND (Di, F3,q), THEN (wi, Gq),
(12.20)
362
FABRIZIO RUSSO
where Fp,,~ (p = 1, 2, 3) is a fuzzy set described by the triangularshaped membership function ].lFp,,~
(Up) =
1  2 lupap'~l bp,q
f o r  b p ,q _< otherwise,
(. 0,
1,t
 a.p,q
< _ b p,~/,
(12.21)
and Ga is a fuzzy singleton defined on the interval [0, 1] because a resulting weight ranging from zero to unity is required (0 _< w i _< 1). According to the inference scheme described in the previous section, the output of the fuzzy system is determined using the following relation: y.27 a=x vaAq y.27 , a=x ha
wi=
(12.22)
where a a is the degree of activation of the qth rule and v a is the support point of Ga. The tuning of fuzzy sets, that is, the search for the optimal fuzzy set parameters, is performed by using a training image and a least meansquares (LMS) algorithm. To derive a suitable expression for this purpose, we evaluate a a using the product instead of the minimum operator. A filter adopting a different inference scheme has also been proposed [Tag96]. Cancellation of mixed noise distributions (such as mixed Gaussian and impulse noise) can be performed by splitting the weight wi (n) into different components [Mun96]: y ( n ) = xiN~
.
(1)(n)w~G)(n)xi(n) ,
(12.23)
N .w i (I) (n) w~G) (n) ~'i=O
wherew i(I)(n)andw i(G) (n) are weight components dealing with impulse and Gaussian noise, respectively. These components can be evaluated by fuzzy systems dealing with appropriate local features. We finally observe that FWM filters can adopt weights to combine the outputs Yl, Y2, ..., YQ of different nonlinear techniques. The output of this class of filters (here called fuzzy combination filters) is given by the following relationship: y ( n ) = ~.Q=I w k ( n ) y k ( n ) . Wk (n)
~'~=1
(12.24)
A n adaptive filterfor mixed noise removal combines the outputs of five (Q = 5) classical operators, such as midpoint, mean, median, and identity filters and a smallwindow median filter[Tag95b]. The weights are evaluated by a fuzzy system dealing with three local features. Another technique addresses image enhancement by combining the outputs of three (Q = 3) fuzzy filtersdesigned to reduce Gaussian noise, remove outliers, and enhance edges [Cho97]. The different outputs are combined depending on the value of a local feature dealing with the luminance differences and spatial distances between the central point and the neighboring pixels.
CHAPTER 12:
NONLINEAR FILTERS BASED ON FUZZY MODELS
lj
3
363
4
INNN
10111
L2
Figure 12.3: Neighborhood and pixel indexes.
12.4
FIRE Filters
FIRE (Fuzzy Inference Ruled by Elseaction) filters are special fuzzy systems based on IFTHENELSE reasoning [Rus92, Rus93]. Unlike FWM filters, FIRE filters directly yield the noise correction as the output of the inference process. Hence, FIRE filters belong to the class of direct approaches mentioned in Sec. 12.1. Typically, a FIRE filter adopts directional rules that deal with the luminance differences AXx, A X 2 , . . . , A X N between the central pixel and its neighbors. The fuzzy rulebase evaluates a positive or a negative correction Ay that aims at removing the noise (THEN action). If no rule is satisfied, the central pixel is left unchanged (ELSE action). Different noise statistics can be addressed by means of appropriate choices of the fuzzy sets, rules, and aggregation mechanisms [Rus95, Rus97].
12.4.1
FIRE Filters for Uniform Noise
The AY.FIRE filter is an example of a FIRE operator designed to deal with uniformly distributed noise [Rus96]. The filter adopts directional rules that aim at preserving the image edges during the noise removal. A repeated application of the operator does not increase the detail blur. As a result an effective noise cancellation can be obtained. The filter operates on the neighborhood depicted in Fig. 12.3. As mentioned, the input variables are the luminance differences Axi(n) = xi(n)  x(n). By using fuzzy rules, the operator nonlinearly maps these inputs to the output variable Ay(n). This represents the correction term that, when added to x(n), yields the resulting luminance value y (n) = x(n) + Ay (n). Fuzzy rules deal with two fuzzy sets labeled positive (PO) and negative (NE). The triangularshaped membership function/~eo is defined by the following relationship: 1 lucl f o r  c < u < 3c,
PPO (u) =
1  ~Z0,
otherwise.
(12.25)
The membership function ~/NE is symmetrically defined: ]/NE($/) = ~/PO (  •). If the image is corrupted by uniformly distributed noise in the interval [An, An], the
364
FABRIZIO RUSSO
WO Wl
W9
W2
W3
W4
W5
W6
W7
W8
WlO Wll W12 W13 W14 W15 W16 Figure 12.4: Pixel patterns.
parameter c is set as follows: c = An. The directional rules deal with the patterns of pixels Wi (i = 0,..., 16) represented in Fig. 12.4. To perform a positive or a negative correction, we define two symmetrical smoothing rules for each pattern. For example, the following rules are associated with the pattern Wj: IF (Ax1,AI,j) AND ... AND (AXN,AN,j) THEN (Ay, PO),
(12.26)
IF (Ax1,A~,j) AND ... AND (AXN, A[v,j) THEN (Ay, NE),
(12.27)
where Aid =
f PO forxi ~ Wj, NE f o r x i f f W j ,
(12. 28)
A*. { NE for xi ~ Wj, ~,J PO f o r x i ~ W j .
(1
2.29)
,%.
The degrees of activation Aj and A~ of the above rules are respectively given by Aj = f A Y ( q , iIAI,j ( A X l ), . . . , IIAN,j ( A X N ) ) ,
(12.30)
Aj 9 = fA~ ( q , ~ A t j ( A X l ) , . . . , I ~ A } j ( A X N )
(12.31)
),
where fA~ denotes a fuzzy aggregation function whose behavior is either the minimum operator or the arithmetic mean: fAY. (q,//l,/~2,...,l~N) =
{
mini{/~i} i f q = l , 1 ~i=1 N ~/i if q = 0. N
(12.32)
The output Ay of the fuzzy operator is finally computed as
Ay = C (mjax {Aj}  mjax {A2 }) .
(12.33)
Since the filter possesses a good detailpreserving behavior, it can be repeatedly applied to the image data to increase the noise cancellation (multipass filtering). In particular, an effective smoothing action can be obtained by varying the fuzzy aggregation scheme in Eq. (12.32) during the multipass process. The minimum operator (q = 1) defined by Eq. (12.32) can be used at the beginning of the multipass filtering, when a strong detailpreserving action is required. The arithmetic mean (q = 0) can be adopted at the end of the same process to smooth noisy pixels still present on the edges of the image.
365
CHAPTER 12: NONLINEAR FILTERS BASED ON FUZZY MODELS
1
2
3
4
0
5
6
7
8
Figure 12.5: Pixel indexes in the window.
12.4.2
FIRE Filters for Mixed Noise
To address mixed noise, a FIRE filter typically combines rules for different noise distributions by adopting a hierarchical structure. The filter operates on the neighborhood represented in Fig. 12.5. Input and output variables are defined as in the previous section. The filter adopts four piecewise linear fuzzy sets labeled large positive (LP), m e d i u m positive (MP), m e d i u m negative (MN), and large negative (LN). The membership functions Tn,Lp, Tn,Mp, Tn,MN, and ?/~LN of these sets are graphically represented in Fig. 1 2 . 6 ( O < c < L  1 ; O < a < b , 0 ~b, the expected number of minority pixels per unit area approaches with increasing r.
CHAPTER 13:
DIGITALHALFTONING
387
Z9 < 9 L) < 0
0 RADIAL DISTANCE
r
(o)
0
0.7017 RADIAL FREQUENCY
Figure 13.14: Top: pair correlation with principal wavelength Ab. Bottom: RAPSD with principal frequency fb of the ideal bluenoise pattern. (Reproduced with permission from [Lau98]. 9 1998 IEEE.) 3. The average number of minority pixels within the radius r increases sharply near AD. The resulting pair correlation for blue noise is therefore of the form indicated in Fig. 13.14 (top), in which R ( r ) shows a strong inhibition of minority pixels near r = 0, marked (a); decreasing correlation of minority pixels with increasing r [limr.oo R ( r ) = 1], marked (b); and a frequent occurrence of the interpoint distance AD, the principal wavelength, indicated by a series of peaks at integer multiples of AD, marked (c). The principal wavelength is indicated in the figure by a diamond located along the horizontal axis.
Spectral Statistics Turning to the spectral domain, the spectral characteristics of blue noise in terms of P (fp) are shown in Fig. 13.14 (bottom) and can be described by three unique features: little or no low frequency spectral components, marked (a); a fiat, high frequency (bluenoise) spectral region, marked (b); and a spectral peak at cutoff frequency fb, the bluenoise principal frequency, marked (c), such that
j/D fb = x/1,q/D
for 0 < g < 1/2, for 1 / 2 < g < 1 .
The principal frequency is indicated in the figure by a diamond located along the horizontal axis. Note that P(fp) is plotted in units of cr 2 = g(1  g), the
388
DANIEL L. LAU AND GONZALO R. ARCE
x (n)
.._1 ..zF" I
y (n) .._
Xe(n) I b ~ lye(n) Figure 13.15: Error diffusion algorithm. 9
1/48 3/48 5/48 3/48 1/48
9Q)8/42 4/42 2142 4142 8/42 4/42 2/42 1/42 2/42 4/42 2/42 1/42
(b)
(c)
9 7/48 5/48
7/16
3/16 5/16 1/16
3/48 5/48 7/48 5/48 3/48
(a)
Figure 13.16: (a) Floyd and Steinberg, (b) Jarvis, Judice, and Ninke, and (c) Stucki error filters for error diffusion. variance of an individual pixel in qb. The sharpness of the spectral peak at the bluenoise principal frequency is affected by the separation between minority pixels, which should have some variation. The wavelengths of this variation should not be significantly longer than ~b since this adds low frequency spectral components to the corresponding dither pattern qb [Uli88], causing qb to appear more white than blue.
13.3.2
Error Diffusion
In error diffusion halftoning (Fig. 13.15), the output pixel y(n) is determined by adjusting and thresholding the input pixel x(n) such that y(n) =
1 for [x(n)  x e ( n ) ] O otherwise,
> O,
where xe('n.) is the diffused quantization error accumulated during previous iterations, M
X e (n) = ~. b i . ye (n  i),
(13.1)
ii
with y e ( n ) = y ( n )  [ x ( n )  x e ( n ) ] and ~.ii u bi = 1. Using vector notation, Eq. (13.1) becomes x e (n) = bXy e (n), where b = [ b x , b 2 , . . . , b u ] x and ye(n) = [ y e ( n  1 ) , y e ( n  2 ) , . . . , y e ( n  M ) ] T. First proposed by Hoyd and Steinberg [F]o76], the original error weights, bi, are shown in Fig. 13.16, along with those proposed later by Jarvis, Judice, and Ninke [Jar76] and by Stucki. The resulting hafftoned images, produced by error
CHAPTER 13:
DIGITALHALFTONING
389
Figure 13.17: Grayscale image halftoned using error diffusion with (a) Floyd and Steinberg, (b) Jarvis, Judice and Ninke, and (c) Stucki filter weights employing a serpentine raster scan. diffusion using these weights along with a serpentine lefttoright and then righttoleft raster scan, are shown in Fig. 13.17. Figure 13.18 illustrates the spatial and spectral characteristics of the three proposed error diffusion filters. While these spatial and spectral characteristics approximate the ideal bluenoise model, Ulichney [Uli87] showed that with some variation, error diffusion can be an improved generator of blue noise with resulting patterns having spatial and spectral characteristics much closer to those of the ideal bluenoise pattern. Possible variations include using randomized weight positions and/or a perturbed threshold, a process that is equivalent to adding low variance white noise to the original input image [Kno92]. Another variation of particular significance involves perturbing the filter weights instead of the threshold. This perturbation of filter weights is accomplished by first pairing weights of comparable value, and then for each pair of weights, a scaled random value is added to one and subtracted from the other. To prevent negative values, we set the maximum noise amplitude (100%) to the value of the smaller weight in each pair. Using the Floyd and Steinberg filter weights, perturbing each of the pairs (7/16, 5/16) and (3/16, 1/16) creates a good bluenoise process, with the addition of 50% noise to each pair optimizing the tradeoff between graininess and stable texture [Uli87].
13.3.3
BlueNoise Dither Arrays
Since the concept of bluenoise was first introduced in 1987, many alternative techniques for generating visually pleasing bluenoise halftone patterns have been introduced. Two techniques of particular importance are the direct binary search algorithm [Ana92] and halftoning by means of bluenoise dither arrays [Su191]. The direct binary search algorithm iteratively swaps binary pixels to minimize the difference between the original image and a low pass filtered version of the current binary pattern. While the computational complexity is much higher than that for error diffusion, the resulting images are far superior in quality.
390
DANIEL L. LAU AND GONZALOR. ARCE
Z9 ,
4
g=l/8 g=l/4\g=l/2\ ]'1 ) / ~]] ~ Ig=l/2~,N'~~ \ ~~: g ~ , 1 6 ~ 1
......
ml
'
o,
RADIALFREQUENCY
0.5 RADIALFREQUENCY
0
0.7071
(a) Floyd and Steinberg Z9
/g .~!./4/g~./.8.i 2 .......... e.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4l
g_,,4~i g,,~ g
9
~0
/
6 /o/o
o
:,
0
RADIALFREQUENCY
ix.,
0.5 RADIALFREQUENCY
0.7071
(b) Jarvis, Judice and Ninke Z9 2 g=l/~ . ./g=l/4/g=l/8 !
...
.... ix.,
i~4l. ~ ,~,l~'l'~\ ,1,~x ~,,~~kl  . PR'lV1VI~ 0
RADIALFREQUENCY
0.5 RADIALFREQUENCY
0.7071
(c) Stucki Z 9 ~O g1~2 t =....... ~ i '''/g=!/4"/g=l/8 } 2 ~ ............................. C " )
~0
~ 0_~0
0
RADIALFREQUENCY
4l 0
g,,2__~/ 0.5 RADIALFREQUENCY
0.7071
(d)FloydSteinberg weights with 50% perturbation Figure 13.18: Spatial and spectral statistics using error diffusion for intensity levels 1/2, 1/4, 1/8, and 1/16. Dither array halftoning is a technique in which the binary image is generated by comparing on a pixelbypixel basis the original continuoustone image and a dither array or mask, as in AM halftoning. By designing these masks to have bluenoise characteristics, visually pleasing FM patterns can be generated with the absolute minimum in computational complexity. For a demonstration, Fig. 13.19 shows a bluenoise dither array and the corresponding halftoned image.
13.4 GreenNoise Dithering Just as bluenoise is the high frequency component of whitenoise, greennoise is the midfrequency component. Like blue, it benefits from aperiodic, uncorrelated structure without low frequency graininess, but unlike blue, greennoise patterns
CHAPTER 13:
DIGITAL HALFTONING
391
Figure 13.19: (a) Bluenoise dither array and (b) resulting halftoned image.
Figure 13.20: Grayscale image halftoned using green noise with increasing amounts of clustering, with (a) having the smallest clusters and (c) having the largest. exhibit clustering. The result is a frequency content that lacks the high frequency component characteristic of bluenoise. Hence the term "green." The objective of using greennoise is to combine the maximum dispersion attributes of bluenoise with that of clustering of AM halftone patterns, and to do so at varying degrees, as shown in Fig. 13.20. Noting Fig. 13.21, the motivation for greennoise is to offer varying levels of robustness, with the most robust patterns having tone reproduction curves close to those of AM patterns and the least robust having curves close to those of bluenoise. That is, greennoise can sacrifice spatial resolution for pattern robustness. Since different printers have different characteristics with respect to the printed dot, greennoise, therefore, can be tuned to offer the highest spatial resolution achievable for a given device.
392
DANIEL L. LAO AND GONZALOR. ARCE
Figure 13.21" Measured input versus output reflectance curves for an ink jet printer using AM halftoning with ceils of size 16 x 16, FM halftoning, and greennoise halftoning.
13.4.1
Spatial and Spectral Characteristics
Greennoise, when applied to an image of constant gray level g, arranges the minority pixels of the resulting binary image such that the minority pixels form clusters of average size M pixels separated by an average distance Ag (Fig. 13.22). Point process statisticians have long described clustering processes such as those seen in greennoise by examining the cluster process in terms of two separate processes: (1) the parent process which describes the location of clusters, 1 and (2) the daughter process, which describes the shapes of clusters. In AM halftoning, clusters are placed along a regular lattice, and therefore variations in AM patterns occur in the cluster shape. In FM halftoning, the cluster shape is deterministic, a single pixel; it is the location of clusters that is of interest in characterizing FM patterns. Greennoise patterns, having variation in both cluster shape and cluster location, require an analysis that looks at both the parent and daughter processes. Looking first at the parent process ~p, qbp represents a single sample of the parent process such that ~)p  {Xi " i  1, 2 , . . . , N c } , where Nc is the total number of clusters. For the daughter process ~d, (1)8 represents a single sample cluster o f ~d s u c h t h a t ~)d " {.,Vj " j  1, 2 , . . . ,M}, w h e r e M is t h e n u m b e r of m i n o r i t y 1The location of a cluster refers to the centroid of all points within the cluster.
CHAPTER 13:
393
DIGITAL HALFTONING
pixels in the cluster. We first define the translation, or shift in space, Tx (B) of a set B = {Yi : i = 1, 2, 3,...} by x, relative to the origin, as
Tx(B) = {Yi + x : i = 1,2,3,...}, and define t#d / a s the ith sample cluster for i = 1, 2 , . . . , Nc. A sample t#G of the greennoise halftone process is defined as
ChG = ~. Tx,(Cha,) = ~. { Y j i  x i ' j = X , 2 , . . . , U i } , XiEt#p
XiE(~p
the sum of Nc translated clusters. The overall operation is to replace each point of the parent sample Cp of process ~p with its own cluster ~d of process ~d. To derive a relationship between the total number of clusters, the size of the clusters, and the gray level of a binary dither pattern, we define Ig as the binary dither pattern resulting from halftoning a continuoustone discretespace monochrome image of constant gray level g and I e [n] as the binary pixel of I e with pixel index n. From the definition of r (B) as the total number of points of in B, CG (Ig) is the scalar quantity representing the total number of minority pixels in I e, and Cp (Ig) is the total number of clusters in I e such that Cp (Ie ) = Nc. The intensity ~/, being the expected number of minority pixels per unit area, can now be written as qbG(Ig) g for 0 < g < 5, = 1 (13.2) :/= N()ai Xg for~_ 0) or decreases (h < 0) the likelihood of a minority pixel if the previous outputs were also minority pixels. Equation (13.5) can also be written in vector notation as
xh(n) = haTy(n)
(13.6)
where a = [al,a2,...,aN] T a n d y ( n ) = [ y ( n  1 ) , y ( n  2 ) , . . . , y ( n  N ) ] T. The calculation of the parameters ye(n) and xe(n) remains unchanged from the original error diffusion algorithm. From Eqs. (13.5) and (13.6), calculation of the binary output pixel y (n) can be summarized as y(n) =
1 for [x(n)  b T y e ( n ) + haTy(n)] > 0, 0 otherwise.
CHAPTER 13:
397
DIGITAL HALFTONING
xh(n) ~ _ _ _ _ x (n) +
Xe(n)
e(n)
Figure 13.24: Error diffusion with the outputdependent feedback algorithm.
hysteresis 1/2 1/2 l" I 1/2 1/2 error Figure 13.25: An arrangement of two hysteresis and two error diffusion coefficients for error diffusion with outputdependent feedback.
Figure 13.26: Grayscale images halftoned using error diffusion with outputdependent feedback with hysteresis parameter h equal to (a) 0.5, (b) 1.0, and (c) 1.5.
Using the arrangement of Fig. 13.25 for two hysteresis and two error filter coefficients, Fig. 13.26 shows the resulting halftoned images using hysteresis values h = 0.5, 1.0, and 1.5, respectively. The corresponding spatial and spectral characteristics are shown in Fig. 13.27. As with error diffusion, improved results can be achieved t h r o u g h alternative filter arrangements, p e r t u r b e d filter weights, and p e r t u r b e d thresholds.
398
DANIEL L. LAU AND GONZALOR. ARCE
z 9 2
[   " ~ g = I/~g=1/4/g= i/8
.]
..............................................
4l
g=l/2\ ~ g=1/8"~=.1/4 ~ I'~
::
[
r.)
~o
0
RADIALFREQUENCY