Preface
This book has two principal aims: to teach scientific programming and to
introduce stochastic modelling. Stochastic modelling in particular, and mathematical modelling in general, are intimately linked to scientific programming
because the numerical techniques of scientific programming enable the practical application of mathematical models to real-world problems. In the context
of stochastic modelling, simulation is the numerical technique that enables us
to analyse otherwise intractable models.
Simulation is also the best way we know of developing statistical intuition.
This book assumes that users have completed or are currently undertaking
a first-year university level calculus course. The material is suitable for first
and second year science/engineering/commerce students and masters level students in applied disciplines. No prior knowledge of programming or probability
is assumed.
It is possible to use the book for a first course on probability, with an emphasis on applications facilitated by simulation. Modern applied probability and
statistics are numerically intensive, and we give an approach that integrates
programming and probability right from the start.
We chose the programming language R because of its programming features.
We do not describe statistical techniques as implemented in R (though many
of them are admittedly quite remarkable), but rather show how to turn algorithms into code. Our intended audience is those who want to make tools, not
just use them.
Complementing the book is a package, spuRs, containing most of the code
and data we use. Instructions for installing it are given in the first chapter. In
the back of the book we also provide an index of the programs developed in
the text and a glossary of R commands.
How to use this book
This book has grown out of the notes prepared for a first year course consisting
of 36 lectures, 12 one-hour tutorials, and 12 two-hour lab classes. However
it now contains more material than would fit in such a course, which permits
its use for a variety of course structures, for example to meet prerequisite
requirements for follow-on subjects. We found the lab classes to be particularly
important pedagogically, as students learn programming through their
own experimentation. Instructors may straightforwardly compile lab classes
by drawing on the numerous examples and exercises in the text, and these are
supplemented by the programming projects contained in Chapter 22, which
are based on assignments we gave our students.
Core content: The following chapters contain our core material for a course
on scientific programming and simulation.
Part I: Core knowledge of R and programming concepts. Chapters 1–6.
Part II: Thinking about mathematics from a numerical point of view: applying Part I concepts to root finding and numerical integration. Chapters 9–11.
Part III: Essentials of probability, random variables, and expectation required to understand simulation. Chapters 14–16 plus the uniform distribution.
Part IV: Stochastic modelling and simulation: random number generation, Monte-Carlo integration, case studies and projects. Chapters 20.1–20.2, 21, 23.1–23.2 and 24.
Additional stochastic material: The core outlined above only uses discrete
random variables, and for estimation only uses the concept of a sample average
converging to a mean. Chapters 17 and 18 add continuous random variables,
the Central Limit Theorem and confidence intervals. Chapter 19 discusses both discrete and continuous time Markov chains. Chapters 20.3–20.5 and
22 then look at simulating continuous random variables and variance reduction.
With some familiarity of continuous random variables the remaining case
studies, Chapter 23.3–23.4, become accessible.
Note that some of the projects in Chapter 22 use continuous random variables,
but can be easily modified to use discrete random variables instead.
Additional programming and numerical material: For the core material basic
plotting of output is sufficient, but for those wanting to produce more
professional graphics we provide Chapter 7. Chapter 8, on further programming,
acts as a bridge to more specialised texts, for those who wish to pursue
programming more deeply.
Chapter 12 deals with univariate and multivariate optimisation, and Chapter 13 with systems of ordinary differential equations. These chapters assume a basic familiarity with vector and differential calculus. This material is self-contained,
though there are some exercises and examples elsewhere in the book that incorporate optimisation. In particular, the curve fitting example 18.1.2 uses the optim function, and curve fitting also appears in Exercise 19.13.13 and student project 24.2. However,
if you are prepared to use optimise and optim as black boxes, then these curve fitting examples/exercises are quite
accessible without reading optimisation chapter.
We also mention student project 24.5 (the pipe spiders of Brunswick), which guides students through a stochastic optimisation problem using an evolutionary algorithm. This project is self-contained and does not rely on Chapter 12 (and is always very popular with our own students).
Chapter outlines
1: Setting up. Here we describe how to obtain and install R, and the package
spuRs which complements the book.
2: R as a calculating environment. This chapter shows you how to use R to
do arithmetic calculations; create and manipulate variables, vectors, and matrices;
work with logical expressions; call and get help on built-in R functions;
and to understand the workspace.
3: Basic programming. This chapter introduces a set of basic programming
structures that are the building blocks of many programs. Some structures are
common to numerous programming languages, for example if, for and while
statements. Others, such as vector-based programming, are more specialised,
but are arguably just as important for efficient R coding.
4: Input and output. This chapter describes some of the infrastructure that R
provides for importing data for subsequent analysis, and for displaying and
saving results of that analysis. More details on the construction of graphics
are available in Chapter 7, and we provide more information about importing
data in Chapter 6.
5: Programming with functions. This chapter extends Chapter 3 to include
user-defined functions. We cover the creation of functions, the rules that they
must follow, and how they relate to the environments from which they are
called. We also present some tips on the construction of efficient functions,
with especial reference to how they are treated in R.
6: Sophisticated data structures. In this chapter we study R’s more sophisticated
data structures—lists and dataframes—which simplify data representation,
manipulation, and analysis. The dataframe is like a matrix but extended
to allow for different data modes in different columns, and the list is a general
data storage object that can house pretty much any other kind of R object.
We also introduce the factor, which is used to represent categorical objects.
7: Better graphics. This chapter provides a deeper exposition of the graphical
capabilities of R, building on Chapter 4. We explain the individual pieces that
make up the default plot. We discuss the graphics parameters that are used
to fine-tune individual graphs and multiple graphics on a page. We show how
to save graphical objects in various formats. Finally, we demonstrate some
graphical tools for the presentation of multivariate data (lattice graphs), and
3D-graphics.
8: Further programming. This chapter briefly mentions some more advanced
aspects of programming in R. We introduce the management of and interaction
with packages. We present details about how R arranges the objects that
we create within the workspace, and within functions that we are running.
We provide further suggestions for debugging your own functions. Finally, we
present some of the infrastructure that R provides for object-oriented programming,
and for executing code that has been compiled from another computer
language, for example, C.
9: Numerical accuracy and program efficiency. In this chapter we consider
technical details about how computers operate, and their ramifications for
programming practice, particularly within R. We look at how computers represent
numbers, and the effect that this has on the accuracy of computation
results. We also discuss the time it takes to perform a computation, and programming
techniques for speeding things up. Finally we consider the effects
of memory limitations on computation efficiency.
10: Root-finding. This chapter presents a suite of different techniques for finding
roots. We cover fixed-point iteration, the Newton-Raphson method, the
secant method, and the bisection method.
11: Numerical integration. This chapter introduces numerical integration. The
problem with integration is that often a closed form of the antiderivative is not
available. Under such circumstances we can try to approximate the integral
using computational methods. We cover the trapezoidal rule, Simpson’s rule,
and adaptive quadrature.
12: Optimisation. This chapter covers the problem of finding the maximum
or minimum of a possibly multivariate function. We introduce the Newton
method and the golden-section method in the context of a univariate function,
and steepest ascent/descent and Newton’s method for multivariate functions.
We then provide some further information about the optimisation tools that
are available in R.
13: Systems of ordinary differential equations. We cover the Euler, midpoint,
and fourth-order Runge-Kutta (RK4) schemes for solving systems of first-order ordinary differential equations (initial value problems only). The numerical efficiency of the different schemes are compared experimentally, and
it is shown how to improve the RK4 scheme by using an adaptive step size,
analogous to the adaptive quadrature seen in Chapter 11.
14: Probability. In this chapter we introduce mathematical probability, which
allows us to describe and think about uncertainty in a precise fashion.We cover
the probability axioms and conditional probability. We also cover the Law of
Total Probability, which can be used to decompose complicated probabilities
into simpler ones that are easier to compute, and Bayes’ theorem, which is
used to manipulate conditional probabilities in very useful ways.
15: Random variables. In this chapter we introduce the concept of a random
variable. We define discrete and continuous random variables and consider
various ways of describing their distributions, including the distribution function,
probability mass function, and probability density function. We define
expectation, variance, independence, and covariance. We also consider transformations
of random variables and derive the Weak Law of Large Numbers.
16: Discrete random variables. In this chapter we study some of the most
important discrete random variables, and summarise the R functions relating
to them. We cover the Bernoulli, binomial, geometric, negative binomial, and
the Poisson distribution.
17: Continuous random variables. This chapter presents the theory, applications
of, and R representations of, a number of continuous random variables.
We cover the uniform, exponential, Weibull, gamma, normal, χ2, and t distributions.
18: Parameter estimation. This chapter covers point and interval estimation.
We introduce the Central Limit Theorem, normal approximations, asymptotic
confidence intervals and Monte-Carlo confidence intervals.
19: Markov chains. This chapter covers both discrete and continuous time
Markov chains. We cover transition and rate matrices, classification of states,
limiting behaviour (including balance equations), Kolmogorov forward and
backward equations, finite absorbing chains (including expected absorption
times and probabilities) and expected hitting times. We cover techniques for
defining the state space, including lumping states and supplementary variables, and methods for simulating both discrete and continuous time chains.
20: Simulation. In this chapter we simulate uniformly distributed random variables
and discrete random variables, and describe the inversion and rejection
methods for simulating continuous random variables. We also cover several
techniques for simulating normal random variables.
21: Monte-Carlo integration. This chapter covers simulation-based approaches
to integration. We cover the hit-and-miss method, and the more efficient
Monte-Carlo integration method. We also give some comparative results on
the convergence rate of these two techniques compared with the trapezoid and
Simpson’s rule, which we covered in Chapter 11.
22: Variance reduction. This chapter introduces several sampling-based innovations
to the problem of estimation. We cover antithetic sampling, control
variates, and importance sampling. These techniques can vastly increase the
efficiency of simulation exercises when judiciously applied.
23: Case studies. In this chapter we present three case studies, on epidemics,
inventory, and seed dispersal (including an application of object-oriented coding).
These are extended examples intended to demonstrate some of our simulation
techniques.
24: Student projects. This chapter presents a suite of problems that can be
tackled by students. They are less involved than the case studies in the preceding
chapter, but more substantial than the exercises that we have included
in each chapter.
Caveat computator
R is under constant review. The core programmers schedule a major release
and a minor release every year. Releases involve many changes and additions,
most of which are small, but some of which are large. However, there is no
guarantee of total backward compatibility, so new releases can break code that
makes assumptions about how the environment should work.
For example, while we were writing the first edition of this book, the upgrade
from version 2.7.1 to 2.8.0. changed the default behaviour of var, to return an
NA where previously it returned an error, if any of the input were NA. Happily,
we had time to rewrite the material that presumed that an error would be
returned.
We conclude that R changes, and we note that this book was written for
version 3.0.1. The spuRs package will include a list of errata.
Bibliography/further reading
For those wishing to further their study of scientific programming and simulation,
here are some texts that the authors have found useful.
The R language
W.N. Venables and B.D. Ripley, S Programming. Springer, 2000.
W.N. Venables and B.D. Ripley, Modern Applied Statistics with S, Fourth
Edition. Springer, 2002.
J.M. Chambers and T.J. Hastie (Editors), Statistical Models in S.
Brooks/Cole, 1992.
J. Maindonald and J. Braun, Data Analysis and Graphics Using R: An
Example-Based Approach, Second Edition. Cambridge University Press, 2006.
Scientific programming/numerical techniques
W. Cheney and D. Kincaid, Numerical Mathematics And Computing, Sixth
Edition. Brooks/Cole, 2008.
M.T. Heath, Scientific Computing: An Introductory Survey, Second Edition.
McGraw-Hill, 2002.
W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical
Recipes, 3rd Edition: The Art of Scientific Computing. Cambridge University
Press, 2007.
C.B. Moler, Numerical Computing with Matlab, Society for Industrial Mathematics,
2004.
Stochastic modelling and simulation
A.M. Law and W.D. Kelton, Simulation Modeling and Analysis, Third Edition.
McGraw-Hill, 1999.
M. Pidd, Computer Simulation in Management Science, Fifth Edition. Wiley,
2004.
S.M. Ross, Applied Probability Models with Optimization Applications. Dover,
1992.
D.L. Minh, Applied Probability Models. Brooks/Cole, 2001.
|