Sei qui: Home > News > INSTANCE – the Italian seismic dataset for machine learning

INSTANCE – the Italian seismic dataset for machine learning

Figure 14
Figure 14. Example of randomly selected earthquake waveforms of the broadband HH channels contained in INSTANCE. Each rowcontains three randomly selected traces drawn according to the following criteria: (a-c) earthquakes 2≤M <3(66.8 % of the total of theHH channels); (d-f) earthquakes3≤M <4(13.5 %); (g-i) earthquakes M≥4(2.0 %); (j-l) earthquakestrace_E_snr_db≥10 andpath_ep_distance<100 km (55.0 %); (m-o)

Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinovic, D., and Lauciani.
Earth System Science Data Discussions,


The Italian earthquake waveform data are here collected in a dataset suited for machine learning analysis (ML) applications. The dataset consists of near 1.2 million three-component (3C) waveform traces from about 50,000 earthquakes and more than 130,000 noise 3C waveform traces, for a total of about 43,000 hours of data and an average of 21 3C traces are provided per event. The earthquake list is based on the Italian seismic bulletin ( of the “Istituto Nazionale di Geofisica e Vulcanologia” between January 2005 and January 2020 and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak (HH, EH channels) and strong motion recordings (HN channels). All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Many of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at (Michelini et al., 2021).

Font Resize