Eurostat and Statistics Finland Methodological Workshop on EU-SILC
Annotated program and abstracts
Helsinki, 7-8 November 2006
Starting time: Tuesday, 7 November at 15.00
The methodological workshop is held back to back with the Eurostat - Statistics Finland International conference on "Comparative EU statistics on income and living conditions: issues and challenges", Helsinki 6-7 November 2006
The objective of the workshop is to discuss technical/methodological issues linked to EU-SILC in order to share best practices, and to inform users about the variety of practices in EU-SILC implementation. The methodological workshop will cover many aspects of the SILC process, namely: process quality, editing and imputation, gross/net conversion, estimation and calibration, variance estimation, coherence analysis and measurement.
For each session, there is a review paper with a presentation of 20 minutes focusing on a key aspect of the session and short presentations of maximum 10 minutes illustrating national implementations on the reviewed topics. Substantive discussions aiming to identify good practices and addressing opened issues will take place at the end of each session.
Tuesday, 7 November 2006
0. Opening address - Anne Clémenceau - Eurostat (15.00 -15.15)
1. Session on Workflow/survey process/process quality for EU-SILC (Chair: Ian Dennis) (15.15-16.15)
Tea & Coffee break (16.15-16.45)
2. Session on Editing, imputation (including imputed rent), gross/net conversion (Chair: Martin Zeleny) (16.45 -18.00)
Wednesday, 8 November
3. Session on Estimation/calibration (Chair: Paloma Seoane) - (9.00 -10.10)
Tea & Coffee break (10.10-10.40)
4. Session on Variance estimation (Chair: Fabienne Montaigne) - (10.40 -11.40)
5. Session on Coherence analysis (Chair: Martin Bauer) - (11.40 -12.50)
Lunch break (12.50-14.00)
6. Session on Measurement - questionnaires (Chair: Jussi Simpura) - (14.00 -14.50)
7. Closing the workshop: Conclusions and way forward by Jean-Marc Museux (14.50 -15.00)
End of Workshop at 15.00
This paper gives an insight into the development of survey data in EU-SILC from the time a question is asked in a household and keyed into the CAPI-laptop to the final statistical analysis. Four years of EU-SILC in Austria as a survey and the requirement to produce a consistent micro-dataset, fully checked, imputed and weighted three times up to now have brought up numerous challenges. In describing the actual work process we try to convey how some of them were met. Special emphasis will be put on ways to streamline the data editing process and become more and more efficient each year whilst at the same time data quality can be improved.
2003 was the first regular (cross-sectional) survey of EU-SILC in Austria, 2004 the rotating panel was started. While in 2003 everything had to be done for the first time, from 2004 on improvements on these initial set ups could be made. An integrated system of data management was developed that aimed at fulfilling the following criteria: highest possible automatisation, full transparency, expandability. This system was implemented in SPSS-code. To guide the user through the programmes variable- and syntax-names follow a certain logic that is easily understood, plus, flag variables help to see what was or has to be done to a certain variable. The mayor advantages of this data management system as described in the initial goals proved to be that it
All programmes are handled in a main control syntax. The programme modules each correspond to a certain stage in the survey and/or the data development process:
1. Checking raw data and feeding back inconsistencies to field work
Of course, each of these modules is again broken down into several sub-tasks, often on the level of income components. So the whole system basically consists of modules that can be changed and improved without disturbing the functioning of the other modules. The positive side effect is: By standardizing the steps of data handling more time and resources can by applied to actual improvements regarding the content.
This paper aims at giving an overview over this complex system of data management for EU-SILC, but also gives examples for good practices having so far been developed especially in the fields of checking the raw data and editing income-variables.
Contribution will be aimed to evaluation of particular procedures within EU SILC 2005 and 2006 survey organisation process (preparation of fieldwork, data collection, data recording...). Coherence as one of the main dimensions of the quality will be presented through data comparison on available external sources at national level.
The presentation will describe the implementation of the EU-SILC survey in the Czech Republic from the point of the survey process - the different stages of the survey implementation and their organizational consequences. Attention will be paid to the interaction between project and process management in the light of the currently undergoing institutional reform within the Czech Statistical Office and its implication for the implementation of EU-SILC
The basic requirement in EU-SILC (EU Statistics on Income and Living Conditions) concerning income variables is to record gross income in specified detail at the personal and income component level, but disposable income only as a set of three variable at the total household level. There may be severe practical difficulties for some Member States in collecting income data exactly in this form, whether the data are obtained from registers or directly from respondents in sample surveys. The objective of this paper is to develop, test and recommend procedures on how this problem may be overcome. Both in theory and in practice, this is a complex task. The modelling procedure for net-gross transformation can range from the very complex to the very simple.
At the sophisticated end of the spectrum, tax and social insurance contributions are imputed in as exact a manner as possible using a tax-contribution model, therefore requiring full information on the tax regime operating in a particular country during the reference period. In developing a such sophisticated approach, one possibility is for each Member State to use their own micro-simulation model to provide the output variables required to construct the EU income definition where they are not collected directly from the survey. If such models were to be built from scratch, they would require considerable effort in collating information about the tax regime in each country and repeating it annually. However, many countries have already developed their own models in order to explore the impact of different tax-benefit policies. Such models incorporate household micro-data from nationally representative sources and calculate disposable income for each household under alternative tax/benefit regimes. Although the aim of such models is to simulate the impact of policy changes and so the usual output consists of the micro-level change in household disposable incomes as a result of policy change, they could equally well be used to estimate tax/social insurance contributions under a single tax/benefit regime - as is required for EU-SILC. Euromod provides a well-known example of such modelling in EU-wide context.
Complex models may not be feasible or necessary in all cases. There can be simpler models which take into account only the major elements of the tax regime, and provide sets of ratios or factors which can be applied to individual income components. Alternatively, one may pursue more purely statistical approaches. These are likely to be simpler and more uniform across countries. For instance, net/gross ratios or conversion factors may be determined empirically (statistically) as functions of the household's level and composition of income, household size and composition, and other characteristics available in the survey. These relationships may be determined on the basis of aggregate data, such as tax statistics or the national accounts. In principle, information for this purpose may also come from within the income survey itself.
At the other end of the spectrum a single ratio of tax and social insurance contributions paid to gross income may be applied to all income components. This, in the most simple form, was the approach used in ECHP (European Community Household Panel). The basis of the simple approach used in the ECHP was that the (net/gross) ratio can be expressed as a function only of the level of income. Furthermore, empirical data for determining this relationship came from the survey itself - based solely on income from employment, for which both gross and net amounts were collected in the survey. However, it has been clear that the very simple, almost 'crude', approach used for ECHP data will not suffice for EU-SILC. For ECHP, in contrast to EU-SILC, the target variables were net (rather than gross) income components. At the same time, income components in the ECHP questionnaire are presumed reported net of tax and other deductions, except for a few components. The most important of these is income from self-employment, which was taken to be reported gross of tax and social insurance payments. Hence, it was not the purpose of the ECHP to provide accurate gross income information. Rather, the objective was to provide a factor which could be used to convert the few components which were reported gross into net values, so that net income, total and by component, of the household could be estimated.
EU-SILC, on the other hand, requires accurate information on both net and gross incomes, and the latter by detailed component and the person level. Hence it is necessary to adopt a more sophisticated approach for EU-SILC, using some more appropriate micro-simulation methodology.
This paper is the first report at an international scientific conference of features and applications of the Siena Micro-Simulation Model (SM2). It will focus on the standardised core of the system and explain how it can handle specific features of diverse fiscal systems and forms in which income of households and persons has been recorded.
The model SM2 is introduced in Section 2 and is described in the remaining sections by introducing complexities step-by-step. Section 3 introduces various terms, and describes the basic relationships between them by considering the model in the simplified situation of a person receiving income from a single source and taxed separately as a single-person tax unit. Section 4 gives a fuller description of the micro-simulation model in the more realistic situation involving more than one income components and multi-person tax units. A number of illustrations (from France, Italy and Spain) are provided. Section 5 deals with issues arising from diverse forms of the input data input. The form (net, gross, etc.) in which the information has been collected may vary from one individual to another in the same survey. Section 6 introduces the additional complexity resulting from differences in how particular components of income are treated in the tax regime. A outstanding feature of SM2 is that these special features of the tax system can be captured within the general structure of the model simply by appropriately defining special types of 'deductions' and 'tax credits' for the component concerned. Section 7 presents a summary of main results from the micro-simulation system developed for application to country data under EU-SILC. The results presented concern the construction of EU-SILC Target Variables on income in gross and net forms, from the data collected in various forms. So far, it is assumed that data are available for all income components in whatever form and the objective is to convert them to a homogeneous form such as that required for EU-SILC target variables; in the presence of missing data, Section 8 covers the problem of treating imputation and microsimulation in conjunction. Finally, Section 9 concludes the paper.
The aim can be to inform countries of the potential use of SM2 in the construction of EU-SILC target variables. This can be done in conjunction with the application in a particular country, such as Italy.
One of the medium-term aims of the survey is to provide gross income figures broken down into components. The Spanish questionnaires ask for both gross and net figures for all components in general. But since respondents are often unaware of their gross income, we have to build a model to convert net figures to gross for the various components.
In accordance with Annex II of the Commission Regulation on definitions, Eurostat allows Spain not to provide gross figures until the 2007 survey. With the aid of documents drawn by Eurostat, and national experts, INE has done the initial work to implement net-to-gross conversion of current monthly wages for the first survey (2004).
The procedure to calculate gross pay is based on an iterative method which uses the net pay figures reported by respondents. The methodology used can be checked against a wide range of records, because a percentage of respondents (particularly high for current wages) state both gross and net wages (in the 2004 survey, 44% of employees stated net and gross figures).
My presentation will preliminarily deal with imputed rent on the basis of the method used for the Income Distribution Statistics.
It is the first time when imputation of missing data in social statistics is done by Central Statistical Bureau of Latvia. The item non-response of income components collected in EU-SILC 2005 survey was in range 1.38 - 36.11 %. Suitability of multiple imputation, Hot deck and other methods for EU-SILC survey is discussed in this paper. The results of implementation of imputation methods will be presented.
The objective of the presentation is to review in a unified way, the whole weighting procedure for the standard integrated EU-SILC design, covering the initial sample, and its cross-sectional as well as longitudinal development. The paper will also address the organisation of the weight variables in the data files cumulating several years. It will highlight the different open issues where some discussions are still needed. It will list the estimations issues proposing way forward
The presentation concentrates on the estimation process of Finnish EU-SILC, emphasising the special characteristics due to the sampling design, rotation and calibration. Finnish EU-SILC is conducted in the framework of Income Distribution Survey of Finland (IDS), which includes a two-wave panel structure. For the need of some household-based surveys (including IDS/EU-SILC) a master sample of persons is selected. This sample is further processed in order to create dwelling units around the persons with some additional useful characteristics created, e.g. specific socio-economic & income classification of the dwelling unit. The new IDS/EU-SILC sample is selected from the master sample (two-phase sampling) following the foregoing classification as the stratification. The allocation emphasises well-earning households, farmers and entrepreneurs. The weighting process takes into account the sampling design and the non-response effect before the calibration is conducted. The integrative calibration (both household and person levels, separately for different waves) includes distributions on household size, province, type of municipality, gender & age classes based on the population and 15 different income variables. These actions have different consequences on the efficiency of estimation, depending on what kind of parameters are studied, e.g. the design effect of the estimator of the gini coefficient is much better than the design effect for the indicator "at risk of poverty rate after social transfers".
In this paper we summarize the process followed at INE-Spain (National Statistical Institute) to calculate the cross-sectional weights of wave 1 and 2, those available at the time of preparing this document.
The paper is organised as follows, first at all, in section 2 a brief description of the sample design of the Spanish EU-SILC, in section 3 we discuss the weights of wave 1, as the process is similar to other household surveys we do not explain it in detail. In section 4 we review the calculation of the cross sectional weight in wave 2. In the last part of the article we will view some problems, alternatives and the work to be done in the near future.
Unlike some other European countries, for Estonia the EU-SILC survey was a completely new survey and we have adopted an integrated rotational design proposed by Eurostat (see EU-SILC Doc 23/01). According to this design, the sample for any one year consists of 4 sub-samples. Any particular sub-sample remains in the survey for 4 years; each year one of 4 sub-samples from previous year is dropped and a new one added. At year one, 4 sub-samples are selected, one of them is dropped immediately after the first year, the second is retained for only 2 years, the third for 3 years, and only the forth is retained for the full 4 years.
In Estonia, we have experienced quite low response rates in 2004 and had to keep all of the four sub-samples in the sample of 2005 to meet minimum sample size criterion for longitudinal component. None of these original sub-samples were dropped out of the sample of 2006 for the same reason. In year 2007 we plan to drop two sub-samples started in 2004, and another two in 2008. After year 2008 rotation scheme will be standard: each year one sub-sample dropped and one added. Due to modified rotation scheme, some simple changes should be made in the weighting procedure. We describe it for cross-sectional component in year 2005.
The main difference of our design in year 2005 from the standard integrated one is the number of sub-samples to be weighted independently and combined thereafter to form final weights.
Weighting procedure in general follows Eurostat recommendations (EU-SILC Doc 157/05) except for the calculation of household design weights from sub-sample household weights. The reason is that for year 2005 we have 5 sub-samples representing non-immigrant population of 2005, while original rotation scheme expected 4. Thus, to obtain household design weights, sub-sample household weights of households not consisting only of immigrants are divided by 5, not by 4 as standard weighting procedure expected.
Standard weighting procedure for longitudinal component and cross-sectional weighting for further years can be modified in a similar way. First, we weight each sub-sample independently using standard procedure and then take into account larger number of sub-samples while combining these weights to obtain household weights applicable to the whole sample
The paper intends to present the approach which was implemented by Eurostat to produce variance estimates for the EU-SILC social indicators (at-risk-of-poverty rate, at-risk-of-poverty threshold, Gini coefficient...). After linearizing the indicators, variance calculations have been performed by the software Poulpe, a SAS macro-based application developed by France's National Statistical Office (INSEE). Some issues in connection to the variance results are addressed. At the end, the results are benchmarked against alternative variance estimation methods (Bootstrap, Jackknife and Ultimate Cluster).
In this presentation we summarize the method used at INE-Spain to calculate the sampling errors of the Laeken indicators of the EU-SILC 1st Wave.
At the National Statistical Institute of Spain there was no experience in the calculation of sampling errors of so complex estimates. Regarding resampling techniques we had some experience in the use of Jackknife procedures to estimate the sampling errors of simpler estimates. Nevertheless the Jackknife is known to behave not very well when dealing with non smooth statistics such as quantiles, and the poverty and inequality indicators are directly or indirectly based on them. So we decided to use the Bootstrap methodology.
We present the method, problems encountered and some results.
The implementation of variance estimation methods - resampling methods (dependent random group method and jackknife) and linearization in EU-SILC survey are discussed in the paper. The paper focuses on estimation of variance for totals, ratios of two totals and Gini coefficient. However developed methodology could be used also for other statistics. The developed program (in SPSS®) for estimation of variance for arbitrary sample with possibilities of adjusting the parameters of methods applied will be also presented.
Two Eurostat data collections, ESSPROS and EU-SILC provide information about social protection benefits received in the European countries, with two different tools: an accounting compilation and a household survey.
This paper presents first characteristics of the two data collection. In a second step, it reports on the differences between ESSPROS definitions and EU-SILC in term of data comparability. The last part concerns the data comparison: establishment of the methodlogical differences, examination of the gap between the data in the two data collection and proposed corrections in the way of reconciliation. As country examples, detailed data comparisons are made for Denmark and Luxembourg
Aim of the paper/presentation is to discuss central methodological questions concerning the coherence of the results in EU-SILC in Austria. EU-SILC started 2003 in Austria, 2004 was the first year of the longitudinal component. EU-SILC is the only socio-economic panel survey in Austria and the main data base for statistics on income, poverty and social exclusion. Its predecessor, the European Household Panel was a longitudinal survey running from 1995 to 2001 in Austria. One of the main problems faced in the ECHP was a declining sample due to panel attrition ending with 2,700 households in Austria after seven years. EU-SILC intended from the very beginning to include the ECHP experience, High methodological standards, and proposing a rotational sample design, should enhance representativity and comparability in particular for income information. Within country coherence seems to be conditional for comparability between the countries.
Coherence of EU-SILC within Austria is analysed with other sources like LFS, wage statistics or National Accounts. The paper analyses the level of coherence for different groups of respondents as well as for type and amount of incomes. Occurrence, causes, impact and possible solution of unit and item non response are discussed on the example of selected problems like under- or overestimation of income for different groups as civil servants or self-employed. EU-SILC 2005 provides results from a cross-sectional as well as a longitudinal perspective
Practical solutions like weighting and imputations are not considered in detail. The main focus is on how to assess the coherence of income and possible solutions in respect to the implementation of the survey and questionnaire in Austria.
EU-SILC is the first instrument of income and poverty statistics in Estonia that is internationally comparable and based on annual income. Up until the launch of EU-SILC, the income and poverty data were derived from the Household Budget Survey (HBS). This paper will seek to compare the two data sources regarding the total income, different income components and social exclusion indicators, as well as to propose some tentative hypotheses to explain the differences that emerge. The data pertaining to income year 2004 is used. While the comparisons were run for all of the components, the presentation will discuss only the ones that are either most common or illustrate some tendency the best.
EU-SILC and HBS differ in substantial terms regarding both the data collection methodology and the concept that is being measured. While in EU-SILC the income data is collected on component level by the means of a questionnaire, HBS derives information from diaries filled out by households themselves. Also, the income that is being measured is annual in EU-SILC and monthly in HBS.
Starting with wages and salaries (variable PY010N in EU-SILC), the number of persons receiving this type of income is 14% higher in EU-SILC as compared to HBS. This is most probably the result of temporary jobs, which are not spread evenly around the calendar year. The average wages in HBS are by about 3% lower in HBS. Non-cash income from wage labour (variable PY020N), i.e. the value of company cars, exemplifies well the differences that can be caused by data collection methods. The number of persons receiving this type of income is more than 2 times higher in EU-SILC. The question is read out to interviewees in EU-SILC, whereas in HBS the question is in the diary and might not always be read by the households.
The review of other income components reveals that on no occasion is the number of persons or households receiving an income component higher in HBS. Instead it is either lower or the same as in EU-SILC. For some one-off incomes such as sickness benefits or tax on property this is to be expected. Average amounts tend to be slightly lower in EU-SILC, although for some components (wages and salaries, company cars, sickness benefits) the opposite is true.
When it comes to total income, the effect of lower occurrence of most income components in HBS seems to outweigh the effect of greater average values. The total household income was 90,100 kroons according to EU-SILC and 80,900 according to HBS. The same 10% difference appears in the case of income per household member. The Laeken indicators derived from both data sources also differ. The at-risk-of-poverty rate as well as the Gini coefficient is higher in HBS than in EU-SILC.
The reasons for the lack of coherence between the two data sources are probably many-fold and not easily distinguished. It may be that irregular types of income are underrepresented in HBS. There is also little known about how exactly the households fill in the diaries in HBS - if it is done only at the end of the month, the smaller incomes may easily be forgotten; also the person filling the diary may be unaware of income received by other household members and so on. Given that in EU-SILC the information is collected retrospectively, the recall error cannot be ruled out either. Additionallu, it must be pointed out that the higher-income households are less likely to non-respond to diaries in HBS. Also, the seasonal variance of income, which is caused by irregular income components and which is not present in EU-SILC, is the reason for more inequal income distribution in HBS.
The paper would consist of two main parts. The basic content will cover conclusions from the comparative analysis of EU-SILC and HDS results concerning poverty, especially Laeken indicators. This will be preceded by a concise presentation of methodological solutions applied by CSO in EU-SILC, and particularly: sample selection, data weighting system and income variable imputation methods. Basic problems encountered when implementing EU-SILC will be pointed out and their possible impact on the quality of results discussed.
The French survey "Standard de vie", conducted by INSEE in 2006, is a survey on consensus on the model of the British Poverty and Exclusion Survey. Besides bringing new methodological insights, clarifying the impact of a variation in the precise wording of the question, it leads to rather different conclusions than the PSE, regarding the goods and services that are considerered as necessities. We try to spell out the consequences of these findings on the choice of the items that, in SILC, are used in the measurement of the poverty in living conditions.
Last updated 23.11.2006