Labour cost and structure of earnings annual survey 2023
Traitement statistique
Frequency of data collection
Annual
Data collection
There are three ways of collecting data. Historically, questionnaires have been sent to establishments by post. Large companies for which several establishments are surveyed can provide a computerised response, via a spreadsheet file. Finally, since Ecmo 2016, the Internet collection method, via the INSEE business survey response portal, has been offered to a sub-sample of establishments for which the "employee" section of the questionnaire sent to them covers few employees.
Data collection period
The Ecmo 2023 collection took place from May to December 2024.
Collection mode
- By post mail
- By Internet
Survey unit
Local unit (of an enterprise)
Sampling method
The sample is selected using a two-stage design, stratified at each stage. First, establishments are drawn and then employees within these establishments. The sampling rate is approximately 3.5% for the establishment level and 0.9% for employees. The stratification used is designed to optimise the accuracy of the main indicator (hourly wages) according to the main breakdowns required by law (by sector of activity, company size, region in particular).
The establishments are selected from the 'all employees' database constituted from social declarations (DSN), crossed with the Sirus register. The sample for year N is drawn from the data at 31/12/N-1.
The establishments surveyed are asked to answer a questionnaire on their establishment and questionnaires on identified employees (from 1 to 24 depending on the case). The sample of employees is differentiated by status (managerial/non-managerial).
The Ecmoss establishment samples are part of the negative co-ordination process between the surveys undertaken at INSEE.
Sample size
Around 18,000 establishments are surveyed, representing 165,000 employees
Data collection documents
The Ecmo 2023 questionnaires are provided in the french section
Data validation
The results are analysed and compared with other indicators disseminated by INSEE, particularly in the context of the quality report sent to Eurostat.
Data compilation
The tables sent to Eurostat always use two successive annual surveys. They are therefore based on observations surveyed in the survey year (N) as well as observations surveyed in the year before (N-1). The earnings variables observed in year N-1 are updated ("aged") to be representative of year N.
Adjustment
Enrichment with administrative data
At the end of the collection, the file of respondents is enriched with information from the 'all employees' database (BTS), mainly from administrative sources (DSN). In this way, the activity of the establishment, the employee's occupation, administrative data on remuneration, paid working hours, etc. are recovered. This enrichment of information from administrative sources is central to the survey process, both to complete the survey data and to check the consistency between administrative information and information from the survey and decide to adjust if necessary.
Scoping
The enrichment phase also serves to identify 'out-of-scope' cases, in order to differentiate them from non-respondents. Out-of-scope establishments (mainly those that have ceased since the establishment of the sampling frame) and out-of-scope employees (either belonging to out-of-scope establishments or having left the establishment since the establishment of the sampling frame) are identified.
Clearance and correction of non-response
For the questionnaires 'employees' :
Dares is responsible for the adjustment of employee questionnaires. The central variables
of the survey (gross salary and number of hours paid) are checked mainly using individual
data from the "all employees" database (BTS). The main principles of the adjustment
operations are as follows:
- The value collected by the questionnaire is kept even in case of inconsistency with
the BTS value, as long as the answers given to the different questions of the questionnaire
are consistent with each other;
- When outliers or missing values are detected or inconsistencies are found internally
or with the BTS data, some variables are adjusted by deterministic imputations with
the BTS variables, others are adjusted by modelling (statistical imputation). Whatever
the source (questionnaire or enrichment) the earnings data are considered more reliable
than the data on durations; it is therefore the durations that are modified in case
of inconsistency.
After these adjustments, Dares calculates a first set of 'employee' weights corrected for total non-response by reallocating the weights of the non-responding units to the respondents belonging to the same draw stratum.
INSEE then carries out a second set of adjustments on the employee data, to meet the constraints imposed by Eurostat. In particular, a special treatment is carried out for employees on fixed-term contracts, for whom Eurostat wants a number of paid hours to be provided.
This expert work makes it possible to obtain for each year of the survey the adjusted, non-calibrated base which, after the calibration carried out at INSEE, becomes the annual base for national distribution.
For questionnaires 'establishments'
The adjustment of the 'establishments' questionnaires particularly concerns the Ecmo format where the establishment part is essential for responding to Eurostat. First of all, a distinction is made between respondents according to their level of response: they may in fact respond sometimes for the establishment or for the enterprise when the information at establishment level is not known. The establishments table is then cleared by eliminating establishments for which there is or is considered to be total non-response. Establishments not responding to a whole block of variables or to certain so-called "key" variables, such as those relating to costs or the wage bill, are considered to be in total non-response. As with the employee questionnaires, initial weights are calculated by reallocating the weights of the non-responding units in a homogeneous manner to the responding units belonging to the same sampling stratum. Then, the missing or incorrectly filled-in responses to questions other than the "key" questions, which lead to the establishment being classified as a total non-response, are adjusted by imputation, in particular by hot-deck.
Treatment of influential units
A treatment of the influential units is carried out, which makes it possible to control the "influence" of individuals who, because of their response and their high weight - and without their response being erroneous - lead to measures that are certainly still unbiased but potentially much less precise of the statistics of interest on the domains to which they belong. This is achieved by applying a winsorisation technique (Kokic and Bell's method) which reduces the weight of the influential individual without losing the information of their response. This makes it possible to improve accuracy.
Calibration on margins
- For each annual survey, the variables taken from the survey are calibrated to the
margins of the total population in paid employment taken from the BTS, according to
a number of criteria (social category, gender, geographical location, etc.).
- After concatenation of the annual files for the Eurostat rendering, the whole set
is calibrated a second time on the margins relative to the year of validity of the
survey.
Each of the adjustments to the margins is carried out using the Calmar procedure.
Eurostat rendering
The tables sent to Eurostat always use two successive annual surveys. On the concatenated base, final adjustments are made to satisfy the constraints imposed by Eurostat. These constraints are first of all strict limits for several variables (working time, valuation of overtime for example), or the absence of partial non-response (deletion of individuals with certain variables missing).