Courrier des statistiques N1 - 2018

The first issue has already been published. It includes an article by the General Director of INSEE on the administrative structure of the French official statistical system based on his presentation delivered at the World Statistics Congress of the International Statistical Institute in 2017.
A four-part dossier then examines the use of administrative sources in statistics, with, among others, a presentation of the French electronic reporting system designed for employers known as the DSN (standing for déclaration sociale nominative, or Nominal Social Declaration) by the Director of the French Public Interest Group (GIP) ‘Modernisation of Social Declarations’. The focus then turns to another topic altogether: the implementation of the global system enabling unique identification of legal entities participating in financial transactions (known as the Legal Entity Identifier, or LEI) and the role played by INSEE in this area. Finally, the last article provides an informative overview of the notion of official statistics in its various facets at both the French and European levels.

Courrier des statistiques
Paru le :Paru le06/12/2018
Catherine Renne, Head of the Employment and Wage Statistics Division, INSEE
Courrier des statistiques- December 2018
Consulter

Understanding the Nominative Social Declaration (DSN) for Better Statistical Measurement

Catherine Renne, Head of the Employment and Wage Statistics Division, INSEE

INSEE has been compiling employment and wage statistics from various administrative declarations for more than half a century. Implementing the Nominative Social Declaration (DSN – "Déclaration Sociale Nominative") was not simply a case of merging the existing administrative declarations; it also led to the streamlining and simplification of data on the one hand, and to the rigorous definition of the framework within which information is exchanged between companies and the organisations receiving the DSN, on the other. It forced INSEE to carry out a major overhaul of its information system at the same time questioning the organisation of its work. The controls carried out upstream, during data collection, ensure, in return, that the declarations are received with a certain degree of quality and therefore with a certain value added. To get the most out of the DSN, statisticians need to be more familiar with the data collection and control phase and more involved in the new governance bodies of the DSN. Although the short-term priority has been to integrate the DSN into its information system, the next step will be for INSEE to build a sufficiently flexible and modular system that will make it possible to take full advantage of the potential of this new source of information.

The DSN, an essential source for statisticians

"Déclaration Unifiée de Cotisations Sociales", "Déclaration Mensuelle de Mouvements de main-d’Œuvre (DMMO)", "Relevé Mensuel de Missions", "Déclaration Annuelle de Données Sociales"… All these French social declarations that most employers were until recently obliged to fill in have now been replaced by the nominative social declaration (DSN – "Déclaration Sociale Nominative") (see article by E. Humbert-Bottin in the same issue).

Having been gradually implemented since 2013, the DSN is much more than a new administrative declaration; it establishes a new reporting logic between companies and recipient organisations (Buhl, 2014). In fact, the purpose of the DSN is not to serve a specific need, but rather to cover various uses.

Prior to the DSN, employers provided sometimes redundant data to a variety of stakeholders with different deadlines and formats. Now, they transmit information from their payroll software to a single point of deposit just once a month. This information is checked and then redistributed to the various recipient organisations, and it is then their responsibility to convert it into a suitable format. In a way, the DSN transfers part of the burden from companies to downstream information systems.

For INSEE, this generalisation of the DSN is gradually replacing the main sources that feed its employment and activity income information system (SIERA – "Système d’Information sur l’Emploi et les Revenus d’Activité"). In other words, the DSN is becoming "THE" major and indispensable source of information and it is essential that statisticians understand and master the various aspects of its processing.

When we talk about the processing of an administrative declaration, we generally think of the statistical processing operations that are carried out, in particular at INSEE. Yet the upstream phase of collecting and controlling information, which is largely the responsibility of external actors, is just as important. End-users are often not aware of this; however, it is precisely this step that ensures that they receive declarations with a certain degree of quality and therefore with a certain value added.

This issue is not specific to the DSN and can largely be applied to any other administrative declaration. The purpose of this article is to describe, using the illustrative example of the DSN, the quality management steps for processing an administrative declaration.

Streamlining and simplifying: two stated objectives of the DSN

With the DSN, companies report three times less data than they did when they completed their Annual Social Data Declarations (DADS – "Déclaration Annuelle de Données Sociales"). By adopting the "Tell us once" principle, the DSN limits the requested data to only basic information that cannot be obtained by any other means. For example, since January 2018, companies are no longer required to provide their end-of-period employee numbers, as this can be recalculated directly by the recipient organisations using the individual employee information provided.

Data streamlining has also made it possible to pool the information collected for a large number of organisations. While the DSN no longer contains sections specific to a particular organisation, as was the case in the Annual Social Data Declarations (DADS), the fact that all this information is gathered in a single declaration guarantees the consistency of information intended for different uses. For example, with the DSN, we can be sure that employers and employees are identified in the same way regardless of the organisation receiving the information.

This streamlining objective could not have been achieved without rigorously defining the framework within which information is exchanged between companies and the organisations receiving the DSN. This framework, also known as the , defines the information template, the message template and the kinematics of the exchanges. A description of these is contained in a document called a "technical manual", which is updated annually and assigned a version number. The information template defines the different fields to be filled in, specifies their semantics, i.e. how they are organized and what they relate to (company, contract, employee, etc.) and checks that the data are not requested several times. In addition, when employers submit their DSN, they do not send the information "in bulk", but instead according to a message template that orders and structures the information in a specific way.

The exchange kinematics (Box 1) describes the process of exchanging data between the various partner organisations. In a very basic way, the DSN collection process is based on two declaration portals: the Ouvrir dans un nouvel ongletGIP Net-DSN portal, for all companies, and the Ouvrir dans un nouvel ongletMSA portal which is only for employers in the agricultural scheme. The information submitted in the monthly DSNs is first received and checked by , which forwards it to the French National Old-Age Insurance Fund (CNAV – "Caisse Nationale d’Assurance Vieillesse") and filters it to distribute an extract of the message to insurance, provident and mutual institutions. The CNAV certifies the employee identification (NIR – "Numéro d’Inscription au Répertoire") via its National ID Management System (SNGI – "Système National de Gestion des Identifiants"), stores the DSNs and then redistributes them electronically to social protection organisations and public bodies, including INSEE.

Getting involved in data collection quality governance

The use of social declarations or administrative sources for statistical purposes is advocated by the European Statistics Code of Practice in order to reduce the statistical burden on those submitting the declarations. In France, this is made possible by Act 51-711 of 1951 on Statistical Obligation, Coordination and Confidentiality and was recently reaffirmed by Act 2016-1321 for a Digital Republic.

While French law facilitates INSEE’s access to administrative declarations, data are generally obtained after a long period of exchanges, preparation and formalisation with the central body. This preliminary phase of the process should not be underestimated as it determines the subsequent steps and the use that can be made of the data. This is all the more important given that the statistical uses of the DSN are not of the same nature as the administrative uses for which the new system was designed as a priority.

With regards to the DSN, INSEE concluded a tripartite agreement in 2016 with the CNAV and , which specifies the terms of delivery (delivery date, content and format of files, etc.). A service contract was also established with the CNAV, the content of which provides more technical details regarding the procedures for exchanging files. Finally, the processing operations envisaged by INSEE and the methods of storing the data were the subject of a declaration to the French Data Protection Authority (CNIL – "Commission Nationale de l’Informatique et des Libertés").

However, any changes to the technical manual also need to be carefully monitored in order for the DSN to be use correctly. This manual is likely to evolve each year to include new sections, as is the case with the 2019 version, which will include new sections specific to public service employees, and the 2020 version, which will include the Mandatory Declaration of Employment of Disabled Workers (DOETH – "Déclaration Obligatoire d’Emploi des Travailleurs Handicapés"). INSEE must then inform the Gip-MDS whether or not it wishes to receive these new sections.

INSEE is regularly informed of changes to the standard’s sections via the governance bodies, in particular the social data standardisation committee of which it is a member. It also has regular exchanges with the Gip-MDS, which is responsible for the operational management of the DSN project. As is the case for other agencies, INSEE may report any anomalies or errors that it identifies in the data in order to improve their quality. In a way, the DSN system functions as a virtuous circle, in which INSEE must gradually assume its full role.

Carefully controlled data collection

The use of administrative declarations is not always easy for statisticians, as they are not directly responsible for the underlying concepts or collection. However, they must understand the different stages involved in the production of the data they receive in order to assess their quality, on the one hand, and to avoid unnecessarily reproducing processing operations that have already been carried out upstream, on the other hand.

When INSEE receives the nominative social declarations each month, they have already been through a long series of automatic checks designed to secure the reporting process. This notion of control is essential in the collection system so that companies can test, using a self-checking tool made available to them, the file containing their declaration before sending it. This tool is consistent with that of the reporting platform, which subsequently implements various types of controls.

Structure and syntax checks are used to assess the overall compliance of the files sent. For example, it is necessary to check that the variables declared respect a certain length and a certain format. Consistency checks are used to verify the consistency between different sections. For example, in address elements, entering the wording of a location requires the associated postal code to be entered. Checks against external reference systems are used to verify that the value taken used certain declared data belongs to a specific nomenclature (socio-professional category, economic activity of the establishment, postal code, etc.). Finally, the validity of each establishment identifier (or ) declared in the DSN is verified, including the SIRET of the user establishment and that of the workplace when the latter has been declared to be an establishment. The declarants’ SIRET is verified as soon as it is entered in the identification portal.

All these controls are blocking, i.e. they fully reject the report until the errors are corrected. They represent the bulk of the automatic checks made by the DSN collection process. There are also non-blocking controls that are present for information purposes only. It was not possible to make all the DSN controls blocking as this risked preventing their submission. The controls carried out on the DSN are merely flow controls; they cannot be used to check the consistency of the information with another source or the consistency of the data transmitted from one month or one year to the next. All these controls are documented and described precisely in the technical manual of the DSN exchange standard and have been the subject of concerted reflection with the various recipient organisations.

Converting data to meet statistical needs

For statisticians, having a documented and controlled upstream source of information, such as the nominative social declaration, is of great value. However, their job is not to simply reproduce the information received; before disseminating it, they must still carry out controls and other processes in order to convert this raw material into statistical information relevant for economic analysis.

The completeness check, often misleadingly referred to as "collection gap detection", is one of the essential steps in statistical processing. With regard to the DSN, this is not intended to replace the Gip-MDS, which already checks the completeness of declaration collection by contacting companies that have not submitted their declarations on time. At this stage, it is necessary to verify that INSEE has received all the declarations that it should have received, on the one hand, and that it has properly integrated all the information received into its information system, on the other. The difficulty here lies in defining the notion of "expected declarations".

One way to proceed is to carry out "macro" plausibility checks to identify any abnormal changes in the number of declarations received. In general, this type of control is carried out upon receipt of the data, which makes it possible to react quickly if missing declarations are detected in accordance with the service contract established with the body issuing the declaration. Statisticians can also carry out more "micro" controls by comparing the declarations received during a given period with those of the previous period or against an external reference system. Before the DSN was set up, this external reference system was based on the various administrative declarations received at INSEE. Replacing the various declarations with a single declaration, the DSN, will, over time, make it difficult to build such a reference system. Indeed, it would be paradoxical to build an external reference system based on the data received in the DSN when it is precisely the completeness of this information that we are trying to control. Inter-temporal consistency checks will therefore be increasingly favoured as the DSN is scaled up.

Following this, statisticians will check the plausibility of the variables that they consider as giving structure to their information system. Detecting potential anomalies is crucial at the stage of the administrative declaration handling process. Several questions then arise: of these anomalies, how can we identify those that turn out to be errors and therefore need to be corrected? In other words, if social declarations contain hourly wages lower that are than the minimum wage or, conversely, extremely high hourly wages, is this a reporting error or a reflection of reality? Can these errors be processed automatically or do they require manual processing? Should all errors be corrected and if not, what criteria should be used to define the processing priorities defined and where do we draw the line? Due to the multiplicity of sometimes conflicting uses, it is often difficult to set a single stopping criterion. Indeed, the criteria for stopping will vary in terms of severity depending on whether we want to publish individual data or aggregate estimates that we can fully accept are based on individual data that are not perfect. The data plausibility check therefore continues downstream of the administrative declaration processing chain. In the field of employment and wages, the process of validating data produced using administrative declarations is a long and iterative process that involves many Ministerial Statistical Department, in addition to INSEE.

Nevertheless, the main difficulty for statisticians when working with an administrative declaration is that it was not designed for their exclusive use. The declaration may therefore contain information that they do not need and, almost significantly, may not be based on their statistical reference units. In the field of employment and activity income, for example, we focus on concepts such as the jobs or periods of activity, yet the DSN provides information on employment contracts and pay periods, on the one hand, and on reports of events such as work stoppages, on the other (Box 2).

A change in the scale and temporality of statistical production

With the implementation of the DSN, there is much more at work than simply the gradual replacement of the administrative declarations that fed the INSEE information system. The institute must now find the resources to process, every month, a volume of information equivalent to that of an Annual Social Data Declarations (DADS). Data volume is becoming an important issue; it is a question of storing and processing just under a hundred gigabytes of data each month, or more than one terabyte per year. Seen in this light, the DSN could have been considered, solely from INSEE’s perspective, to be a technical constraint and treated as such, although this would have been an error.

Undoubtedly, the DSN is revolutionising information systems that have long been around the various administrative declarations. For INSEE, this is a tremendous opportunity to rethink and document its production processes in order to improve efficiency and consistency. Even if the short-term priority has been to integrate the DSN, it is necessary to anticipate the demand on INSEE in the medium term, in the field of employment and activity income, in order to build a sufficiently flexible and modular system that will enable it to meet this demand.

The availability of monthly data now allows data to be processed on a ‘run-of-river’ basis, which should ultimately reduce production times. Having to process approximately 2 million reports containing the nominative social declarations of 20 million employees each month requires INSEE to be sparing in the manual processing. The new processes put in place are therefore largely based on a capitalisation principle in order to avoid repeating the same processes each month. For example, if a process manager processes an anomaly in an employee’s socio-professional category in a given month, and the anomaly remains in the next month’s return, the system will automatically reproduce the processing performed in the previous month. The work carried out on the quality of job location will now be conducted within a single process within the information system (SIERA). Until now, this work was undertaken by different processes within and outside the SIERA. Finally, the processes will be re-engineered with the aim of improving the coherence between macro and micro data on the one hand and between evolving and level statistics on the other hand.

Pirénés: an ambitious re-engineering project

In 2012, INSEE decided to launch a series of projects to re-engineer the SIERA processes, the first of which, the project, aims to host and process the nominative social declaration. The production process was organised around applications, each fulfilling a specific role.

First of all, this requires the capacity to host approximately 2 million files each month. Once the data in those files have been transcribed into "statistical language", the first step for INSEE is to integrate them according to its own data model. This may entail simply renaming the variables or a more complex conversion. This so-called "mapping" operation, which involves about a hundred variables, means the subsequent phases of the process never need to use the source data, just these new variables. This process step is based on the . Originally, the ARC application was designed to perform a number of compliance checks, but experience has shown that most of them were unnecessary as they were already performed by the DSN collection process.

Once the data have been integrated into the information system, processing is carried out on each monthly DSN in the . At this stage, it is a case of checking the quality of variables such as identification of employers or identification, place of residence and socio-professional category of employees (PCS). The principle is to determine, for each unit of information, whether or not there is an anomaly. With regard to identification of employers and municipality of residence of employees, the information reported is compared with that contained in reference systems (statistical unit reference system SIRUS, administrative register SIRENE and Official Geographical Code). The number of anomalies detected over a month is very low compared with the number of reports received, namely around one in ten thousand for the employer’s SIRET and around one in one thousand for the code of the employee’s municipality of residence. For the socio-professional category (PCS), the code declared by the employer is compared to the one calculated by the , based in particular on the wording used for the declared occupation. In 90% of cases, the socio-professional category code reported by the employer is confirmed by the automatic coding tool, which means that the majority of employers report consistent socio-professional category codes and wording. Nevertheless, there are approximately 1.5 million anomalies that may potentially need to be processed each month, which is physically impossible. Cost-quality trade-offs were therefore made in the first instance, which meant only the differences in codification relating to the first two positions of the code were addressed. Nevertheless, this still represents 40% of the anomalies to be processed. Given that this is still a large number, the cases to be processed are grouped into homogeneous clusters. Process managers then establish a processing rule at the cluster level that will be automatically applied to all cases to be processed in the cluster. These rules are capitalised by the application and will be applied automatically when processing the declarations for the following months, which should ultimately reduce the number of cases to be handled by process managers. The gains generated by this capitalisation process will be reused to address discrepancies in the last two positions of the PCS code.

The next step is to analyse the internal inconsistencies in the DSNs for the same year between the workload variables and those related to compensation elements, once the 12 months of aggregate declarations have ended. Here again, cost-quality trade-offs had to be made and only the most important anomalies are submitted for expert analysis by process managers. The processes for detecting and correcting anomalies are carried out in the . Anomaly detection methods are sometimes relatively complex and some are based on econometric models.

At this stage of the processing, it is still necessary to reconstruct the information according to the statistical unit of interest (the job) and to calculate a number of statistical variables. This is the purpose of the . This application also makes it possible to produce a file on all employees, i.e. by including information from information systems on the employment and salaries of civil servants on the one hand, and employees of private employers on the other hand.

After two years of use, what conclusions can we draw and what prospects does the DSN offer?

The implementation of the DSN gives new opportunities for the production of statistics on employment and activity income. Users’ expectations in this area are high, as evidenced by the March 2016 report of the National Council for Statistical Information (CNIS – "Conseil National de l’Information Statistique") on the diversity of forms of employment. However, it is premature to set a deadline by which INSEE will be able to exploit the full potential of this new source of information. In addition, we will need to wait until 2022, once all public employers have been entered into the DSN, to have almost exhaustive coverage of the employer field. It is, however, possible to make an initial assessment at this stage.

As is often the case in projects of this size, transition costs were underestimated. INSEE had to deal with an external project, over which it had no scheduling control. The rate of increase in the workload of the DSN was slower than initially planned and required a large number of Annual Social Data Declarations (DADS) to be introduced in the information system and the two administrative declarations to be combined.

INSEE has had to learn to operate in an environment that is not its own, with actors who are used to working according to certain formalities based on the notion of exchange standards and with requirements in terms of responsiveness. In a way, the DSN has shaken up the temporality of our work. Nevertheless, INSEE must continue to develop upstream exchanges with these actors in order to take full advantage of the virtuous circle that the external DSN project has been able to establish. A similar collaborative working method should be set up, this time downstream of the process, with the actors of the Official Statistical System (SSP - "Service Statistique Public") or of the peri-SSP who are also recipients of the DSN, with whom INSEE shares common concerns. For example, using the DSN to simplify the questionnaire used for certain statistical surveys is a matter for both INSEE, in terms of its Labour Cost and Structure Survey (ECMOSS - "Enquête Coût de la Main d’Œuvre"), and , in the case of its Labour Force Activity and Employment Conditions Survey (ACEMO – "Activité et Conditions d'Emploi de la Main-d'Œuvre"), in particular.

Internally at INSEE, it is now a question of setting up industrialised, documented and secure processes. The processing of the DSN at INSEE must be based on a quality approach that requires a detailed description of the processes, the implementation of a quality assurance framework and a regular review of the processes. This is all the more important given that the DSN project has not yet been completed and a new simplification project is being launched. It should allow organisations providing replacement income (pensions, unemployment benefits, etc.) to do so in turn according to the same principles as the DSN, i.e. according to an exchange standard. Consequently, it should then be easy to apply the approach implemented at INSEE for the processing of the DSN for statistical purposes to the processing of this new administrative declaration.

Box 1. The DSN's exchange kinematics

 

Box 2. The employment contract: a central concept in the DSN

 

Paru le :06/12/2018

In the case of the DSN, it is the NEODeS standard: Norme d’Échange Optimisée des Données Sociales, Optimised Social Data Exchange Standard.

ACOSS: Agence centrale des organismes de sécurité sociale, Central Agency of Social Security Bodies.

Gip-MDS: a Public Interest Group for the modernisation of social declarations.

The SIRET number is an identifier of establishments in the Business register identification system named SIRENE.

Pirénés: Projet Informatique de Refonte sur l’Emploi et les Salaires, Re-engineering Project on Employment and Wages.

ARC: Accueil Réception Contrôle, Receipt and Control of Data.

ARTEMIS: Application de Reprise et de Traitements Élémentaire de l’eMploI et des Salaires, Application for Basic Processing of Employment and Wages.

SICORE: Système Informatique de COdage des Réponses aux Enquêtes, Computer System for the Coding of Survey Responses.

DIANE: Dispositif Informatique pour l’Agrégation et la Normalisation de l’Emploi, IT System for Employment Aggregation and Harmonisation.

THESEE: Traitement HarmonisÉ des Salaires Et de l’Emploi, Harmonised Salary and Employment Processing.

DARES: Direction de l’Animation de la Recherche et des Études et des Statistiques, Statistical Department of the Ministry of Labour, Employment, vocational training and social dialogue.

Pour en savoir plus

Bonnet O., Cordier-Villoing M., Deroyon T., Djiriguian J. et Sakarovitch B., « 5 324 euros de l’heure : outlier ou footballeur ? Méthodes d’apprentissage non supervisé pour la détection d’anomalies : application au cas de la Déclaration sociale nominative », 13e Journées de méthodologie statistique de l’Insee (JMS 2018), juin 2018.

Gazier B., Picart et C., Minni C., « La diversité des formes d’emploi », Rapport du Cnis, n° 142, juillet 2016.

Millin K., « Reconstitution des mouvements de main-d’œuvre depuis 1993 », Document d’études Dares, n° 221, juin 2018.

Buhl J.L., « La Déclaration sociale nominative (DSN) et l’accès aux droits », Regards n° 46, EN3S, 2014.

Rivière P., « Approche coût-qualité pour l’amélioration des processus de production statistique », Courrier des statistiques, n° 105-106, pp. 65-75, juin 2003.

Rouppert B., « Modélisation du processus de traitement d’une source administrative à des fins statistiques », Document de travail Insee, n° C2005/02, 2005.

Pour en savoir plus sur la DSN :

Ouvrir dans un nouvel ongletSite internet du Gip-MDS

Ouvrir dans un nouvel ongletSite internet de la DSN

Ouvrir dans un nouvel ongletCahier technique de la norme Neodes