# [EX1] Examples of Results and Their Use

### Summary Results for All Countries

Consider the whole EU/EEA at year 2040. Open the file under 2040, and copy it to a statistical program, or a spreadsheet program. The first row shows which columns belong to which group by sex (M = males, F = females) and age. In this case 5-year age-groups are available.

There are 40 columns, each with 3,000 elements. We will do two things. First, we will produce a histogram for the predictive distribution of the total population. Second, we will compute statistics for the age-dependency ratio.

1. To get the total population we have to sum row-wise the 40 columns, call them C1, C2, ..., C40. Call the resulting vector S, so S = C1 + C2 + ... + C40. It may be a good idea to divide the numbers by 1,000,000 to have the results in millions, so we substitute S := S/1000000. Figure below shows what the histogram of S then looks like.
2. Let us define the age-dependency ratio, call it A, as the proportion of the population in ages 0-19 and 65+ to the population in ages 20-64. Let Y be the young, O the old, and W those in the middle. We first compute Y = (C1 + C2 + C3 + C4) + (C21 + C22 + C23 + C24) to get the young males and females; W = (C5 + ... + C13) + (C25 + ... + C33) to get the ones in the middle; and O = C14 + ... + C20) + (C34 + ... + C40). Here parentheses are just for clarity. We can then compute the age-dependency ratio simply as A = (Y + O)/W. Typical statistics one might want to compute are the mean = 0.9092, median = 0.9054, and standard deviation = 0.0639, first quartile = 0.8716, and third quartile = 0.9424.

Some practical words of caution:

It is typically not meaningful to study a few of the smallest and a few of the largest simulated values. The models that are used to derive the predictive distributions are approximations, and while they perform well in most cases, there can be rare parameter combinations that produce population paths that we would not consider as realistic - even when accounting for their small probability. The larger the number of simulation rounds, the greater the chance that such outlying values are observed. By putting more constraints into the models, such values could be eliminated, but this adds complexity, and has not been done.

It is typically better to use medians, quartiles, deciles or percentiles to summarize simulation results than to use means or standard deviations. The latter are sensitive to outliers mentioned above. In contrast, even the first percentile is based on the location of the 30th smallest observation in a simulation study of 3,000 runs, so it is not influenced by the values taken by the 29 observations that are smaller.

A tradition in statistics is to compute 95% confidence intervals. This derives historically from hypothesis testing in, e.g., agricultural experimentation, and is intended to guard against too quick an acceptance of weakly tested methods or findings. In those applications one can often (by spending more money) get more precise data, if the interval is too wide and more accuracy is needed. In forecasting, we are dealing with uncertainty that is an order of magnitude greater, and there seems to be relatively little we can do about it. We have found it more practicable to present 80% prediction intervals, or even 67% or 50% prediction intervals to give the user of a forecast an idea of how things might deviate from the point forecast.

Figure. The Predictive Distribution of the EU/EEA Countries in 2040.