INSTANCE – The Italian seismic dataset for machine learning

Italian station
Seismic stations used for waveforms extraction
Earthquake locations

Creative commons license: Attribution 4.0 International (CC BY 4.0)

INSTANCE is a dataset of seismic waveforms data and associated metadata suited for analysis based on machine learning. It includes:

  • 54,008 earthquakes for a total of 1,159,249 3-channel waveforms;
  • 132,330 3-channel noise waveforms;
  • 115 metadata for each waveform providing information on station, trace, source, path and quality;
  • 19 networks;
  • 620 seismic stations.

How to cite the article journal

Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinović, D., and Lauciani, V., INSTANCE – the Italian seismic dataset for machine learning, Earth Syst. Sci. Data, 13 (12), 5509 – 5544, doi:10.5194/essd-13-5509-2021.

How to cite the dataset

INSTANCE The Italian Seismic Dataset For Machine Learning, Alberto Michelini, Spina Cianetti, Sonja Gaviano, Carlo Giunchi, Dario Jozinović & Valentino Lauciani, Seismic Waveforms And Associated Metadata published 2021 in Istituto Nazionale di Geofisica e Vulcanologia (INGV)

Data sources


To get the full dataset you have to download:

sample dataset of approximately 1.7 GB is also provided to allow the users potentially interested to evaluate whether INSTANCE fulfills their needs without downloading the whole dataset. The sample dataset contains 10,000 events and 1000 noise waveforms together with the associated metadata.

  • Sample dataset version 3 (1.7 GB bz2 file, 2.74 GB after decompression). Fixed the the spectral acceleration values wrongly expressed in %g.
  • Sample dataset version 2 (1.7 GB bz2 file, 2.74 GB after decompression). Fixed the metadata parameter name source_mt_scalar_moment_Nm.
  • Sample dataset version 1 (1.7 GB bz2 file, 2.74 GB after decompression)

Additional material can be found on github