Courrier des statistiques N6 - 2021

In this sixth issue, the Courrier des statistiques (Statistics Courier) examines four data sources, two methods and one institution, while remaining open to the outside world, both in France and abroad.

With the 2021 redesign, the Labour Force Survey is modernising its data collection methods and complying with European requirements. Fidéli, a demographic file on dwellings and individuals, has become indispensable, particularly as a pivotal tool for social studies. The permanent demographic sample, with its extended possibilities, brings temporal depth to the analysis of individual trajectories. Finally, the RGCU, a gigantic database on professional careers, designed by the main pension scheme in France (Caisse nationale de l’assurance vieillesse - CNAV), promises to become a valuable source for researchers.

But how can files be matched without a common identifier? The Directorate of Evaluation, Forecasting and Performance Monitoring (Direction de l’évaluation, de la prospective et de la performance - DEPP) presents its method, through its information system on the integration of young people into working life. Upstream, how can administrative databases be improved? To this end, Belgium has institutionalised and implemented an approach that favours preventive methods based on the analysis of anomalies.

The issue concludes by explaining how the National Council for Statistical Information (Conseil National de l’information Statistique - CNIS) organises dialogue between users and producers of official statistics, to ensure the relevance of statistical outputs and to improve them.

Courrier des statistiques
Paru le :Paru le02/10/2023
Odile Rascol, Editor-in-Chief, INSEE
Courrier des statistiques- October 2023
Consulter

Presentation of the issue

Odile Rascol, Editor-in-Chief, INSEE

Relaunched in 2018 in a new format, the Courrier des Statistiques (Statistics Courier) is now in its third year of existence and will soon reach 50 articles. We have been able to explore various subjects, methods and tools, as well as institutional or legal questions raised by official statistics, while taking care to remain open to the outside world, in France or abroad, in order to compare ourselves and feed into our thought processes.

The journal cannot ignore a major change in recent years: while statisticians continue to organise and in data collection through surveys, they must also increasingly take advantage of a world where data already exist that they have not constructed. One may object that this has always been the case with administrative sources, but these are evolving and being enriched, as we saw in issue N1, notably through the . More generally, external data are now always part of the landscape.

The questioning of existing data sources, the way in which they are obtained, their degree of elaboration, their coverage and their timing, is systematic. This is a central concern in this issue, which is the sixth in the new series and which begins by presenting four data sources that are essential for statistical purposes.

It is only fitting that the Labour Force Survey kicks things off: the flagship survey of official statistics in France, around which other statistical operations revolve, remains an inexhaustible source for socio-economic studies year after year. However, it is not immutable, it adapts to a changing world. François Guillaumat-Tailliet and Chloé Tavan present the main lines of the 2021 redesign and the reasons behind it: , as well as the desire to develop the possibility for households to use the internet to respond to INSEE surveys. This set of significant changes required a lot of preparation and experimentation, as well as an extensive preliminary operation to estimate the breaks in series.

At some point, the journal had to open its columns to the promoters of Fidéli, who have often been cited in previous issues: the demographic file on dwellings and individuals is neither the result of a survey, nor of an administrative source, strictly speaking. As Pierre Lamarche and Stéfan Lollivier explain, Fidéli is a pure construction by statisticians for their own needs, a piece of work to ensure coherence and enrichment of administrative sources, particularly fiscal ones. The coherence, completeness and variety of information available are essential for its inclusion in the information system of french official statistics. Using several “raw” sources, the system compiles a single list of dwellings and a single list of individuals, located in their main dwelling, while aggregating their socio-demographic information. Fidéli makes it possible to carry out specific studies and to sample surveys, as well as to use matching to complement survey data with finely localised socio-demographic data, which multiplies its potential for social analysis.

The permanent demographic sample (échantillon démographique permanent - EDP) adds an extra string to our bow: temporal depth, the possibility of working on cohorts and of better understanding changes over time and from one generation to the next. It is already an old source, which traces no fewer than 3.7 million individual trajectories, including 200,000 traced for more than 50 years. Isabelle Robert-Bobée and Natacha Gualbert describe its current functions, as its content and scope have continued to expand over the years, and the main innovations, as the EDP has also had to adapt to changes in its environment. It has recently been enriched with socio-fiscal data, like Fidéli. The compilation of different sources is the originality of the EDP: while it makes it more complex to use for studies, in return it offers possibilities for analysing increasingly diverse trajectories.

Continuing their exploration of the vast universe of administrative sources, Christian Sureau and Richard Merlen discuss the Single General Career Register (répertoire général des carrières unique - RGCU), developed by the French National Pension Fund for Employees (Caisse nationale d’assurance vieillesse - CNAV). This gigantic database should, in time, enable pension organisations, companies, their employees and pensioners to share information, respecting the same concepts, on the different dimensions of the periods that make up a professional career (salaried activity, periods of unemployment, etc.). The quality of this register is ensured, inter alia, by a sophisticated multi-level data control mechanism. Initially a tool to help users and for administrative efficiency, this is a database with a bright future for statistical uses, due to its unparalleled richness. Like the EDP, it includes a temporal aspect: it goes back even further, to the 1930s. Unlike the other three sources, the RGCU is not yet available, and it will take more months, until 2022, for an initial version to be made available to researchers via the Secure Access Data Centre (Centre d’accès sécurisé aux données - CASD).

In order to meet the evaluation needs of the public authorities, Ministerial Statistical Offices have data sources at their disposal, but linking them is not straightforward without a common identifier. Since the 1980s-1990s, this record linkage activity has given rise to a vast amount of academic literature, particularly in Canada, the Netherlands, Australia, the United States and Italy. With the multiplication of available administrative sources, interest in these methods and their application is growing within French official statistics.

The work carried out by the Directorate of Evaluation, Forecasting and Performance Monitoring (Direction de l’évaluation, de la prospective et de la performance - DEPP) is part of this trend. InserJeunes, a system relating to the professional integration of young people, is thus based on the matching of exhaustive administrative sources relating to the schooling of pupils and apprentices, the passing of exams, apprenticeship contracts and salaried contracts, and the Nominative Social Declaration (DSN). Loïc Midy’s article explains the steps required to successfully match these files based on indirect identifiers: normalisation, indexing, calculation of similarities, classification of pairs and evaluation/validation. In doing so, it highlights the complexity of the operation and the limits of “naive” approaches to the subject. The author also looks at the tools to use for the process, providing an overview of existing open source matching tools.

Taking an interest in administrative data also means examining the quality of these data and not taking it for granted. From this point of view, Belgium is clearly ahead of the game, with in-depth work being carried out on anomalies in the data and how to remedy them by going back to the very source of the information: the benefit of this approach has even been recognised by a “royal decree” which imposes it on government bodies. Isabelle Boydens, Gani Hamiti and Rudy Van Eeckhout thus present an original prototype, called ATMS (Anomalies & Transactions Management System). It allows for the monitoring of anomalies and processing, in support of the so-called backtracking method: with a preventive approach to data quality, this method is intended to structurally improve quality at the source, and also makes the link with more traditional curative approaches (data quality tools).

Finally, the last article in this issue takes us in a much more institutional direction, with the National Council for Statistical Information (Conseil national de l’information statistique - CNIS). It echoes on two other stakeholders in statistical governance in France, the French Official Statistics Authority and the Label Committee. Here, Isabelle Anxionnaz and Françoise Maurel describe the operating principles of the CNIS and reveal its mysteries. They remind us, in particular, of its crucial role in organising dialogue between producers and users of official statistics. They show that an institution is not synonymous with a fixed organisation: CNIS meetings and working groups produce, in complete transparency, a shared vision of statistical needs and the relevance of output. And beyond that, the recommendations are implemented in the Official Statistics programmes. Finally, the CNIS contributes, through its monitoring of changes in uses, to the continuous adaptation of the Official Statistical Service in terms of its production methods, their streamlining and their linking to existing sources.

Paru le :02/10/2023

See le issue N3.

Nominative Social Declaration (Déclaration sociale nominative - DSN), see issue N1.

The framework of which we announced in issue N3 and which echoes other redesigns (see issue N2).

See issue N5.