Introduction to Scientific Programming and Simulation Using R, Second Edition

Owen Jones, Robert Maillardet, and Andrew Robinson 

Preface

This book has two principal aims: to teach scientific programming and to introduce stochastic modelling. Stochastic modelling in particular, and mathematical modelling in general, are intimately linked to scientific programming because the numerical techniques of scientific programming enable the practical application of mathematical models to real-world problems. In the context of stochastic modelling, simulation is the numerical technique that enables us to analyse otherwise intractable models.

Simulation is also the best way we know of developing statistical intuition.

This book assumes that users have completed or are currently undertaking a first-year university level calculus course. The material is suitable for first and second year science/engineering/commerce students and masters level students in applied disciplines. No prior knowledge of programming or probability is assumed.

It is possible to use the book for a first course on probability, with an emphasis on applications facilitated by simulation. Modern applied probability and statistics are numerically intensive, and we give an approach that integrates programming and probability right from the start.

We chose the programming language R because of its programming features. We do not describe statistical techniques as implemented in R (though many of them are admittedly quite remarkable), but rather show how to turn algorithms into code. Our intended audience is those who want to make tools, not just use them.

Complementing the book is a package, spuRs, containing most of the code and data we use. Instructions for installing it are given in the first chapter. In the back of the book we also provide an index of the programs developed in the text and a glossary of R commands.

How to use this book

This book has grown out of the notes prepared for a first year course consisting of 36 lectures, 12 one-hour tutorials, and 12 two-hour lab classes. However it now contains more material than would fit in such a course, which permits its use for a variety of course structures, for example to meet prerequisite requirements for follow-on subjects. We found the lab classes to be particularly important pedagogically, as students learn programming through their own experimentation. Instructors may straightforwardly compile lab classes by drawing on the numerous examples and exercises in the text, and these are supplemented by the programming projects contained in Chapter 22, which are based on assignments we gave our students.

Core content: The following chapters contain our core material for a course on scientific programming and simulation.
Part I:    Core knowledge of R and programming concepts. Chapters 1–6.
Part II:   Thinking about mathematics from a numerical point of view: applying Part I concepts to root finding and numerical integration. Chapters 9–11.
Part III:  Essentials of probability, random variables, and expectation required to understand simulation. Chapters 14–16 plus the uniform distribution.
Part IV:  Stochastic modelling and simulation: random number generation, Monte-Carlo integration, case studies and projects. Chapters 20.1–20.2, 21, 23.1–23.2 and 24.

Additional stochastic material: The core outlined above only uses discrete random variables, and for estimation only uses the concept of a sample average converging to a mean. Chapters 17 and 18 add continuous random variables, the Central Limit Theorem and confidence intervals. Chapter 19 discusses both discrete and continuous time Markov chains. Chapters 20.3–20.5 and 22 then look at simulating continuous random variables and variance reduction. With some familiarity of continuous random variables the remaining case studies, Chapter 23.3–23.4, become accessible.

Note that some of the projects in Chapter 22 use continuous random variables, but can be easily modified to use discrete random variables instead.

Additional programming and numerical material: For the core material basic plotting of output is sufficient, but for those wanting to produce more professional graphics we provide Chapter 7. Chapter 8, on further programming, acts as a bridge to more specialised texts, for those who wish to pursue programming more deeply.

Chapter 12 deals with univariate and multivariate optimisation, and Chapter 13 with systems of ordinary differential equations. These chapters assume a basic familiarity with vector and differential calculus. This material is self-contained, though there are some exercises and examples elsewhere in the book that incorporate optimisation. In particular, the curve fitting example 18.1.2 uses the optim function, and curve fitting also appears in Exercise 19.13.13 and student project 24.2. However, if you are prepared to use optimise and optim as black boxes, then these curve fitting examples/exercises are quite accessible without reading optimisation chapter.

We also mention student project 24.5 (the pipe spiders of Brunswick), which guides students through a stochastic optimisation problem using an evolutionary algorithm. This project is self-contained and does not rely on Chapter 12 (and is always very popular with our own students).

Chapter outlines

1: Setting up. Here we describe how to obtain and install R, and the package spuRs which complements the book.

2: R as a calculating environment. This chapter shows you how to use R to do arithmetic calculations; create and manipulate variables, vectors, and matrices; work with logical expressions; call and get help on built-in R functions; and to understand the workspace.

3: Basic programming. This chapter introduces a set of basic programming structures that are the building blocks of many programs. Some structures are common to numerous programming languages, for example if, for and while statements. Others, such as vector-based programming, are more specialised, but are arguably just as important for efficient R coding.

4: Input and output. This chapter describes some of the infrastructure that R provides for importing data for subsequent analysis, and for displaying and saving results of that analysis. More details on the construction of graphics are available in Chapter 7, and we provide more information about importing data in Chapter 6.

5: Programming with functions. This chapter extends Chapter 3 to include user-defined functions. We cover the creation of functions, the rules that they must follow, and how they relate to the environments from which they are called. We also present some tips on the construction of efficient functions, with especial reference to how they are treated in R.

6: Sophisticated data structures. In this chapter we study R’s more sophisticated data structures—lists and dataframes—which simplify data representation, manipulation, and analysis. The dataframe is like a matrix but extended to allow for different data modes in different columns, and the list is a general data storage object that can house pretty much any other kind of R object. We also introduce the factor, which is used to represent categorical objects.

7: Better graphics. This chapter provides a deeper exposition of the graphical capabilities of R, building on Chapter 4. We explain the individual pieces that make up the default plot. We discuss the graphics parameters that are used to fine-tune individual graphs and multiple graphics on a page. We show how to save graphical objects in various formats. Finally, we demonstrate some graphical tools for the presentation of multivariate data (lattice graphs), and 3D-graphics.

8: Further programming. This chapter briefly mentions some more advanced aspects of programming in R. We introduce the management of and interaction with packages. We present details about how R arranges the objects that we create within the workspace, and within functions that we are running. We provide further suggestions for debugging your own functions. Finally, we present some of the infrastructure that R provides for object-oriented programming, and for executing code that has been compiled from another computer language, for example, C.

9: Numerical accuracy and program efficiency. In this chapter we consider technical details about how computers operate, and their ramifications for programming practice, particularly within R. We look at how computers represent numbers, and the effect that this has on the accuracy of computation results. We also discuss the time it takes to perform a computation, and programming techniques for speeding things up. Finally we consider the effects of memory limitations on computation efficiency.

10: Root-finding. This chapter presents a suite of different techniques for finding roots. We cover fixed-point iteration, the Newton-Raphson method, the secant method, and the bisection method.

11: Numerical integration. This chapter introduces numerical integration. The problem with integration is that often a closed form of the antiderivative is not available. Under such circumstances we can try to approximate the integral using computational methods. We cover the trapezoidal rule, Simpson’s rule, and adaptive quadrature.

12: Optimisation. This chapter covers the problem of finding the maximum or minimum of a possibly multivariate function. We introduce the Newton method and the golden-section method in the context of a univariate function, and steepest ascent/descent and Newton’s method for multivariate functions. We then provide some further information about the optimisation tools that are available in R.

13: Systems of ordinary differential equations. We cover the Euler, midpoint, and fourth-order Runge-Kutta (RK4) schemes for solving systems of first-order ordinary differential equations (initial value problems only). The numerical efficiency of the different schemes are compared experimentally, and it is shown how to improve the RK4 scheme by using an adaptive step size, analogous to the adaptive quadrature seen in Chapter 11.

14: Probability. In this chapter we introduce mathematical probability, which allows us to describe and think about uncertainty in a precise fashion.We cover the probability axioms and conditional probability. We also cover the Law of Total Probability, which can be used to decompose complicated probabilities into simpler ones that are easier to compute, and Bayes’ theorem, which is used to manipulate conditional probabilities in very useful ways.

15: Random variables. In this chapter we introduce the concept of a random variable. We define discrete and continuous random variables and consider various ways of describing their distributions, including the distribution function, probability mass function, and probability density function. We define expectation, variance, independence, and covariance. We also consider transformations of random variables and derive the Weak Law of Large Numbers.

16: Discrete random variables. In this chapter we study some of the most important discrete random variables, and summarise the R functions relating to them. We cover the Bernoulli, binomial, geometric, negative binomial, and the Poisson distribution.

17: Continuous random variables. This chapter presents the theory, applications of, and R representations of, a number of continuous random variables. We cover the uniform, exponential, Weibull, gamma, normal, χ2, and t distributions.

18: Parameter estimation. This chapter covers point and interval estimation. We introduce the Central Limit Theorem, normal approximations, asymptotic confidence intervals and Monte-Carlo confidence intervals.

19: Markov chains. This chapter covers both discrete and continuous time Markov chains. We cover transition and rate matrices, classification of states, limiting behaviour (including balance equations), Kolmogorov forward and backward equations, finite absorbing chains (including expected absorption times and probabilities) and expected hitting times. We cover techniques for defining the state space, including lumping states and supplementary variables, and methods for simulating both discrete and continuous time chains.

20: Simulation. In this chapter we simulate uniformly distributed random variables and discrete random variables, and describe the inversion and rejection methods for simulating continuous random variables. We also cover several techniques for simulating normal random variables.

21: Monte-Carlo integration. This chapter covers simulation-based approaches to integration. We cover the hit-and-miss method, and the more efficient Monte-Carlo integration method. We also give some comparative results on the convergence rate of these two techniques compared with the trapezoid and Simpson’s rule, which we covered in Chapter 11.

22: Variance reduction. This chapter introduces several sampling-based innovations to the problem of estimation. We cover antithetic sampling, control variates, and importance sampling. These techniques can vastly increase the efficiency of simulation exercises when judiciously applied.

23: Case studies. In this chapter we present three case studies, on epidemics, inventory, and seed dispersal (including an application of object-oriented coding). These are extended examples intended to demonstrate some of our simulation techniques.

24: Student projects. This chapter presents a suite of problems that can be tackled by students. They are less involved than the case studies in the preceding chapter, but more substantial than the exercises that we have included in each chapter.

Caveat computator

R is under constant review. The core programmers schedule a major release and a minor release every year. Releases involve many changes and additions, most of which are small, but some of which are large. However, there is no guarantee of total backward compatibility, so new releases can break code that makes assumptions about how the environment should work.

For example, while we were writing the first edition of this book, the upgrade from version 2.7.1 to 2.8.0. changed the default behaviour of var, to return an NA where previously it returned an error, if any of the input were NA. Happily, we had time to rewrite the material that presumed that an error would be returned.

We conclude that R changes, and we note that this book was written for version 3.0.1. The spuRs package will include a list of errata.

Bibliography/further reading

For those wishing to further their study of scientific programming and simulation, here are some texts that the authors have found useful.

The R language
W.N. Venables and B.D. Ripley, S Programming. Springer, 2000.
W.N. Venables and B.D. Ripley, Modern Applied Statistics with S, Fourth Edition. Springer, 2002.
J.M. Chambers and T.J. Hastie (Editors), Statistical Models in S. Brooks/Cole, 1992.
J. Maindonald and J. Braun, Data Analysis and Graphics Using R: An Example-Based Approach, Second Edition. Cambridge University Press, 2006.

Scientific programming/numerical techniques
W. Cheney and D. Kincaid, Numerical Mathematics And Computing, Sixth Edition. Brooks/Cole, 2008.
M.T. Heath, Scientific Computing: An Introductory Survey, Second Edition. McGraw-Hill, 2002.
W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical Recipes, 3rd Edition: The Art of Scientific Computing. Cambridge University Press, 2007.
C.B. Moler, Numerical Computing with Matlab, Society for Industrial Mathematics, 2004.

Stochastic modelling and simulation
A.M. Law and W.D. Kelton, Simulation Modeling and Analysis, Third Edition. McGraw-Hill, 1999.
M. Pidd, Computer Simulation in Management Science, Fifth Edition. Wiley, 2004.
S.M. Ross, Applied Probability Models with Optimization Applications. Dover, 1992.
D.L. Minh, Applied Probability Models. Brooks/Cole, 2001.

   

 

Home page - Introduction to Scientific Programming and Simulation Using R -- page last modified: 23 July, 2014 -- Maintained by rjmail at unimelb.edu.au