Demonstration of a common issue encountered in template matching¶

Templates are selected from Weiqiang Zhu's PhaseNet catalog.

Data requirements¶

Download the seismic data at: https://doi.org/10.5281/zenodo.15097180

Download the PhaseNet earthquake catalog with the following three commands:

curl -o adloc_events.csv https://storage.googleapis.com/quakeflow_share/Ridgecrest/adloc/adloc_events.csv
curl -o adloc_picks.csv https://storage.googleapis.com/quakeflow_share/Ridgecrest/adloc/adloc_picks.csv
curl -o adloc_stations.csv https://storage.googleapis.com/quakeflow_share/Ridgecrest/adloc/adloc_stations.csv

Installing `fast_matched_filter`¶

This example uses fast_matched_filter, a Python wrapper for C and CUDA-C routines. The C and CUDA-C libraries have to be compiled. Read the instructions at https://ebeauce.github.io/FMF_documentation/introduction.html.

In [1]:

Copied!





import os
import fast_matched_filter as fmf
import glob
import numpy as np
import matplotlib.pyplot as plt
import obspy as obs
import pandas as pd
import os
import fast_matched_filter as fmf
import glob
import numpy as np
import matplotlib.pyplot as plt
import obspy as obs
import pandas as pd

In [2]:

Copied!





# path variables and file names
DIR_WAVEFORMS = "/home/ebeauce/SSA_EQ_DETECTION_WORKSHOP/data" # REPLACE THIS PATH WITH WHEREVER YOU DOWNLOADED THE DATA
DIR_CATALOG = "../picks_phasenet/"

STATION_FILE = "adloc_stations.csv"
EVENT_FILE = "adloc_events.csv"
PICK_FILE = "adloc_picks.csv"
# path variables and file names
DIR_WAVEFORMS = "/home/ebeauce/SSA_EQ_DETECTION_WORKSHOP/data" # REPLACE THIS PATH WITH WHEREVER YOU DOWNLOADED THE DATA
DIR_CATALOG = "../picks_phasenet/"

STATION_FILE = "adloc_stations.csv"
EVENT_FILE = "adloc_events.csv"
PICK_FILE = "adloc_picks.csv"

Load PhaseNet catalog¶

Here, we read the catalog of the 2019 Ridgecrest sequence made with PhaseNet. Information is divided into three files:

a station metadata file,
an event metadata file (the catalog per se),
a pick database, which contains all the P- and S-wave picks found by PhaseNet.

In [3]:

Copied!

station_meta = pd.read_csv(os.path.join(DIR_CATALOG, STATION_FILE))
station_meta
station_meta = pd.read_csv(os.path.join(DIR_CATALOG, STATION_FILE))
station_meta

Out[3]:

	network	station	location	instrument	component	latitude	longitude	elevation_m	depth_km	provider	station_id	station_term_time_p	station_term_time_s	station_term_amplitude
0	CI	CCC	NaN	BH	ENZ	35.524950	-117.364530	670.0	-0.6700	SCEDC	CI.CCC..BH	0.266086	0.500831	0.049399
1	CI	CCC	NaN	HH	ENZ	35.524950	-117.364530	670.0	-0.6700	SCEDC	CI.CCC..HH	0.295428	0.518465	0.191475
2	CI	CCC	NaN	HN	ENZ	35.524950	-117.364530	670.0	-0.6700	SCEDC	CI.CCC..HN	0.296263	0.541148	0.064485
3	CI	CLC	NaN	BH	ENZ	35.815740	-117.597510	775.0	-0.7750	SCEDC	CI.CLC..BH	-0.231963	-0.415271	-0.331371
4	CI	CLC	NaN	HH	ENZ	35.815740	-117.597510	775.0	-0.7750	SCEDC	CI.CLC..HH	-0.168743	-0.390045	-0.140313
5	CI	CLC	NaN	HN	ENZ	35.815740	-117.597510	775.0	-0.7750	SCEDC	CI.CLC..HN	-0.175671	-0.388116	-0.249066
6	CI	DTP	NaN	BH	ENZ	35.267420	-117.845810	908.0	-0.9080	SCEDC	CI.DTP..BH	-0.305881	-0.602459	-0.503411
7	CI	DTP	NaN	HH	ENZ	35.267420	-117.845810	908.0	-0.9080	SCEDC	CI.DTP..HH	-0.263705	-0.564867	-0.437951
8	CI	DTP	NaN	HN	ENZ	35.267420	-117.845810	908.0	-0.9080	SCEDC	CI.DTP..HN	-0.244383	-0.538990	-0.500516
9	CI	JRC2	NaN	BH	ENZ	35.982490	-117.808850	1469.0	-1.4690	SCEDC	CI.JRC2..BH	0.011361	-0.080285	-0.039941
10	CI	JRC2	NaN	HH	ENZ	35.982490	-117.808850	1469.0	-1.4690	SCEDC	CI.JRC2..HH	0.053539	-0.052748	0.068213
11	CI	JRC2	NaN	HN	ENZ	35.982490	-117.808850	1469.0	-1.4690	SCEDC	CI.JRC2..HN	0.059764	-0.045991	-0.007637
12	CI	LRL	NaN	BH	ENZ	35.479540	-117.682120	1340.0	-1.3400	SCEDC	CI.LRL..BH	-0.295604	-0.540857	0.033788
13	CI	LRL	NaN	HH	ENZ	35.479540	-117.682120	1340.0	-1.3400	SCEDC	CI.LRL..HH	-0.268381	-0.513955	0.146876
14	CI	LRL	NaN	HN	ENZ	35.479540	-117.682120	1340.0	-1.3400	SCEDC	CI.LRL..HN	-0.266329	-0.503088	0.045499
15	CI	LRL	2C	HN	ENZ	35.479540	-117.682120	1340.0	-1.3400	SCEDC	CI.LRL.2C.HN	0.000000	0.000000	0.000000
16	CI	MPM	NaN	BH	ENZ	36.057990	-117.489010	1839.0	-1.8390	SCEDC	CI.MPM..BH	-0.011825	-0.098089	-0.518793
17	CI	MPM	NaN	HH	ENZ	36.057990	-117.489010	1839.0	-1.8390	SCEDC	CI.MPM..HH	0.009095	-0.081896	-0.459349
18	CI	MPM	NaN	HN	ENZ	36.057990	-117.489010	1839.0	-1.8390	SCEDC	CI.MPM..HN	0.011870	-0.052277	-0.532764
19	CI	Q0072	01	HN	ENZ	35.609617	-117.666721	695.0	-0.6950	SCEDC	CI.Q0072.01.HN	0.000000	0.000000	0.000000
20	CI	SLA	NaN	BH	ENZ	35.890950	-117.283320	1174.0	-1.1740	SCEDC	CI.SLA..BH	0.066893	0.118500	-0.081634
21	CI	SLA	NaN	HH	ENZ	35.890950	-117.283320	1174.0	-1.1740	SCEDC	CI.SLA..HH	0.089833	0.128589	-0.042928
22	CI	SLA	NaN	HN	ENZ	35.890950	-117.283320	1174.0	-1.1740	SCEDC	CI.SLA..HN	0.093526	0.168238	-0.150381
23	CI	SRT	NaN	BH	ENZ	35.692350	-117.750510	667.0	-0.6670	SCEDC	CI.SRT..BH	0.148171	0.653411	-0.253527
24	CI	SRT	NaN	HH	ENZ	35.692350	-117.750510	667.0	-0.6670	SCEDC	CI.SRT..HH	0.183868	0.641707	-0.208859
25	CI	SRT	NaN	HN	ENZ	35.692350	-117.750510	667.0	-0.6670	SCEDC	CI.SRT..HN	0.175475	0.674587	-0.295869
26	CI	TOW2	NaN	BH	ENZ	35.808560	-117.764880	685.0	-0.6850	SCEDC	CI.TOW2..BH	0.268083	0.816279	0.028038
27	CI	TOW2	NaN	HH	ENZ	35.808560	-117.764880	685.0	-0.6850	SCEDC	CI.TOW2..HH	0.285970	0.831533	0.102681
28	CI	TOW2	NaN	HN	ENZ	35.808560	-117.764880	685.0	-0.6850	SCEDC	CI.TOW2..HN	0.285113	0.843886	-0.006698
29	CI	WBM	NaN	BH	ENZ	35.608390	-117.890490	892.0	-0.8920	SCEDC	CI.WBM..BH	0.136927	0.128744	-0.166279
30	CI	WBM	NaN	HH	ENZ	35.608390	-117.890490	892.0	-0.8920	SCEDC	CI.WBM..HH	0.152299	0.124250	-0.138614
31	CI	WBM	NaN	HN	ENZ	35.608390	-117.890490	892.0	-0.8920	SCEDC	CI.WBM..HN	0.170719	0.179461	-0.101436
32	CI	WBM	2C	HN	ENZ	35.608390	-117.890490	892.0	-0.8920	SCEDC	CI.WBM.2C.HN	0.000000	0.000000	0.000000
33	CI	WCS2	NaN	BH	ENZ	36.025210	-117.765260	1143.0	-1.1430	SCEDC	CI.WCS2..BH	0.030349	-0.126935	0.023642
34	CI	WCS2	NaN	HH	ENZ	36.025210	-117.765260	1143.0	-1.1430	SCEDC	CI.WCS2..HH	0.065321	-0.099447	0.146628
35	CI	WCS2	NaN	HN	ENZ	36.025210	-117.765260	1143.0	-1.1430	SCEDC	CI.WCS2..HN	0.061651	-0.096450	0.037563
36	CI	WMF	NaN	BH	ENZ	36.117580	-117.854860	1537.4	-1.5374	SCEDC	CI.WMF..BH	0.039416	-0.005761	-0.165183
37	CI	WMF	NaN	HH	ENZ	36.117580	-117.854860	1537.4	-1.5374	SCEDC	CI.WMF..HH	0.075658	0.005327	-0.079900
38	CI	WMF	NaN	HN	ENZ	36.117580	-117.854860	1537.4	-1.5374	SCEDC	CI.WMF..HN	0.085427	0.025273	-0.240971
39	CI	WMF	2C	HN	ENZ	36.117580	-117.854860	1537.4	-1.5374	SCEDC	CI.WMF.2C.HN	0.000000	0.000000	0.000000
40	CI	WNM	NaN	EH	Z	35.842200	-117.906160	974.3	-0.9743	SCEDC	CI.WNM..EH	-0.026889	-0.070269	-0.332510
41	CI	WNM	NaN	HN	ENZ	35.842200	-117.906160	974.3	-0.9743	SCEDC	CI.WNM..HN	-0.011659	-0.118223	-0.038719
42	CI	WNM	2C	HN	ENZ	35.842200	-117.906160	974.3	-0.9743	SCEDC	CI.WNM.2C.HN	0.000000	0.000000	0.000000
43	CI	WRC2	NaN	BH	ENZ	35.947900	-117.650380	943.0	-0.9430	SCEDC	CI.WRC2..BH	-0.023860	-0.043523	0.117833
44	CI	WRC2	NaN	HH	ENZ	35.947900	-117.650380	943.0	-0.9430	SCEDC	CI.WRC2..HH	0.010633	-0.029488	0.165234
45	CI	WRC2	NaN	HN	ENZ	35.947900	-117.650380	943.0	-0.9430	SCEDC	CI.WRC2..HN	0.014658	-0.021367	0.103905
46	CI	WRV2	NaN	EH	Z	36.007740	-117.890400	1070.0	-1.0700	SCEDC	CI.WRV2..EH	-0.003461	-0.154572	-0.355199
47	CI	WRV2	NaN	HN	ENZ	36.007740	-117.890400	1070.0	-1.0700	SCEDC	CI.WRV2..HN	0.017317	-0.137916	-0.273270
48	CI	WRV2	2C	HN	ENZ	36.007740	-117.890400	1070.0	-1.0700	SCEDC	CI.WRV2.2C.HN	0.000000	0.000000	0.000000
49	CI	WVP2	NaN	EH	Z	35.949390	-117.817690	1465.0	-1.4650	SCEDC	CI.WVP2..EH	0.020325	0.008989	-0.341606
50	CI	WVP2	NaN	HN	ENZ	35.949390	-117.817690	1465.0	-1.4650	SCEDC	CI.WVP2..HN	0.024742	-0.103024	-0.089014
51	CI	WVP2	2C	HN	ENZ	35.949390	-117.817690	1465.0	-1.4650	SCEDC	CI.WVP2.2C.HN	0.000000	0.000000	0.000000

The following shows a very rudimentary map of the station network. Look into the cartopy package for more sophisticated maps.

In [4]:

Copied!





_station_meta = station_meta.drop_duplicates("station")

fig, ax = plt.subplots(num="station_network", figsize=(10, 10))
ax.scatter(_station_meta["longitude"], _station_meta["latitude"], marker="v", color="k")
for idx, row in _station_meta.iterrows():
    ax.text(row.longitude + 0.01, row.latitude + 0.01, row.station, va="bottom", ha="left")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.grid()
ax.set_title("Stations used to build the PhaseNet catalog")
_station_meta = station_meta.drop_duplicates("station")

fig, ax = plt.subplots(num="station_network", figsize=(10, 10))
ax.scatter(_station_meta["longitude"], _station_meta["latitude"], marker="v", color="k")
for idx, row in _station_meta.iterrows():
    ax.text(row.longitude + 0.01, row.latitude + 0.01, row.station, va="bottom", ha="left")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.grid()
ax.set_title("Stations used to build the PhaseNet catalog")

Out[4]:

Text(0.5, 1.0, 'Stations used to build the PhaseNet catalog')

No description has been provided for this image

In [5]:

Copied!

event_meta = pd.read_csv(os.path.join(DIR_CATALOG, EVENT_FILE))
event_meta
event_meta = pd.read_csv(os.path.join(DIR_CATALOG, EVENT_FILE))
event_meta

Out[5]:

	time	adloc_score	adloc_residual_time	num_picks	magnitude	adloc_residual_amplitude	event_index	longitude	latitude	depth_km
0	2019-07-04 00:46:47.342963596	0.866596	0.057355	35	0.595368	0.165480	6720	-117.882570	36.091088	4.643969
1	2019-07-04 00:55:32.648412579	0.781116	0.203567	16	1.142510	0.170509	17122	-117.799226	35.378160	11.078458
2	2019-07-04 00:56:37.232733104	0.908073	0.086183	42	0.912494	0.166681	5411	-117.880902	36.091986	4.854336
3	2019-07-04 02:00:39.149363202	0.814322	0.036164	15	0.209530	0.092534	17868	-117.866468	36.093520	4.981447
4	2019-07-04 03:05:31.018885833	0.799281	0.080708	11	0.104050	0.156833	25037	-117.846320	36.100386	5.943363
...	...	...	...	...	...	...	...	...	...	...
20898	2019-07-09 23:58:14.298499048	0.933013	0.097816	67	1.543257	0.131523	1907	-117.711811	35.926468	6.652020
20899	2019-07-09 23:58:47.701746285	0.882434	0.090185	73	1.087089	0.140159	2051	-117.604263	35.797450	6.617413
20900	2019-07-09 23:59:05.102247662	0.798047	0.435077	26	1.147040	0.170994	12567	-117.509729	35.692722	12.815041
20901	2019-07-09 23:59:40.257837813	0.971081	0.065523	35	1.161323	0.068586	7726	-117.846289	36.061435	5.224666
20902	2019-07-09 23:59:49.650466544	0.800524	0.023674	15	0.936622	0.095959	18566	-117.896100	36.095886	6.761860

20903 rows × 10 columns

In [6]:

Copied!

picks = pd.read_csv(os.path.join(DIR_CATALOG, PICK_FILE))
picks
picks = pd.read_csv(os.path.join(DIR_CATALOG, PICK_FILE))
picks

Out[6]:

	station_id	phase_index	phase_time	phase_score	phase_type	dt_s	phase_polarity	phase_amplitude	event_index	sp_ratio	event_amplitude	adloc_mask	adloc_residual_time	adloc_residual_amplitude
0	CI.WMF..BH	280874	2019-07-04 00:46:48.759	0.594	P	0.01	0.110	-3.213107	6720	0.955091	3.124152e-07	1	0.003919	0.012320
1	CI.WMF..HH	280881	2019-07-04 00:46:48.818	0.973	P	0.01	0.938	-3.077638	6720	0.948896	3.368390e-07	1	0.026491	0.063669
2	CI.WMF..HN	280881	2019-07-04 00:46:48.818	0.973	P	0.01	0.898	-3.128019	6720	1.598008	2.146277e-07	1	0.017580	0.173702
3	CI.WRV2..EH	280945	2019-07-04 00:46:49.450	0.977	P	0.01	-0.855	-3.464453	6720	1.000000	3.298200e-07	1	0.077194	0.244381
4	CI.WRV2..HN	280945	2019-07-04 00:46:49.450	0.941	P	0.01	-0.906	-3.585863	6720	1.147235	3.908305e-07	1	0.056694	0.041691
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
955700	CI.WRV2..HN	8639355	2019-07-09 23:59:53.550	0.688	S	0.01	-0.022	-3.384471	18566	0.644492	1.973834e-06	1	0.019093	0.028409
955701	CI.JRC2..HH	8639499	2019-07-09 23:59:54.998	0.402	S	0.01	-0.023	-3.446238	18566	0.647843	6.197256e-06	1	0.020364	-0.154404
955702	CI.JRC2..HN	8639499	2019-07-09 23:59:54.998	0.344	S	0.01	-0.016	-3.531800	18566	0.731796	4.075353e-06	1	0.012773	-0.164659
955703	CI.WVP2..EH	8639574	2019-07-09 23:59:55.740	0.314	S	0.01	-0.103	-3.488117	18566	0.636005	2.815148e-06	1	-0.092233	0.316712
955704	CI.WVP2..HN	8639576	2019-07-09 23:59:55.760	0.695	S	0.01	0.076	-3.218676	18566	0.825223	4.657352e-06	1	0.039319	0.332722

955705 rows × 14 columns

Pick one event¶

Let's use the PhaseNet catalog to read the waveforms of an event.

In [7]:

Copied!





def fetch_event_waveforms(
    event_picks,
    dir_waveforms=DIR_WAVEFORMS,
    time_before_phase_onset_sec=2.0,
    duration_sec=10.0
    ):
    """
    Fetches the waveforms for a given event based on the picks.

    Parameters
    ----------
    event_picks : pandas.DataFrame
        DataFrame containing the picks for the event.
    dir_waveforms : str, optional
        Directory where the waveform data is stored, by default DIR_WAVEFORMS.
    time_before_phase_onset_sec : float, optional
        Time in seconds to start the waveform before the phase onset, by default 2.0.
    duration_sec : float, optional
        Duration in seconds of the waveform to fetch, by default 10.0.

    Returns
    -------
    obspy.Stream
        Stream object containing the fetched waveforms.
    """
    stream = obs.Stream()
    for _, pick in event_picks.iterrows():
        # check whether we have a miniseed file for this waveform
        if pick.phase_type == "P":
            files = glob.glob(os.path.join(dir_waveforms, pick.station_id + "Z*mseed"))
        elif pick.phase_type == "S":
            files = glob.glob(os.path.join(dir_waveforms, pick.station_id + "[N,E]*mseed"))
        starttime = obs.UTCDateTime(pick.phase_time) - time_before_phase_onset_sec
        endtime = starttime + duration_sec
        for _file in files:
            stream += obs.read(
                _file,
                starttime=starttime,
                endtime=endtime
            )
    return stream
    
def fetch_event_waveforms(
    event_picks,
    dir_waveforms=DIR_WAVEFORMS,
    time_before_phase_onset_sec=2.0,
    duration_sec=10.0
    ):
    """
    Fetches the waveforms for a given event based on the picks.

    Parameters
    ----------
    event_picks : pandas.DataFrame
        DataFrame containing the picks for the event.
    dir_waveforms : str, optional
        Directory where the waveform data is stored, by default DIR_WAVEFORMS.
    time_before_phase_onset_sec : float, optional
        Time in seconds to start the waveform before the phase onset, by default 2.0.
    duration_sec : float, optional
        Duration in seconds of the waveform to fetch, by default 10.0.

    Returns
    -------
    obspy.Stream
        Stream object containing the fetched waveforms.
    """
    stream = obs.Stream()
    for _, pick in event_picks.iterrows():
        # check whether we have a miniseed file for this waveform
        if pick.phase_type == "P":
            files = glob.glob(os.path.join(dir_waveforms, pick.station_id + "Z*mseed"))
        elif pick.phase_type == "S":
            files = glob.glob(os.path.join(dir_waveforms, pick.station_id + "[N,E]*mseed"))
        starttime = obs.UTCDateTime(pick.phase_time) - time_before_phase_onset_sec
        endtime = starttime + duration_sec
        for _file in files:
            stream += obs.read(
                _file,
                starttime=starttime,
                endtime=endtime
            )
    return stream
    

In [8]:

Copied!

# explore event_meta to find a nice intermediate-size earthquake we could plot
event_meta.head(20)
# explore event_meta to find a nice intermediate-size earthquake we could plot
event_meta.head(20)

Out[8]:

	time	adloc_score	adloc_residual_time	num_picks	magnitude	adloc_residual_amplitude	event_index	longitude	latitude	depth_km
0	2019-07-04 00:46:47.342963596	0.866596	0.057355	35	0.595368	0.165480	6720	-117.882570	36.091088	4.643969
1	2019-07-04 00:55:32.648412579	0.781116	0.203567	16	1.142510	0.170509	17122	-117.799226	35.378160	11.078458
2	2019-07-04 00:56:37.232733104	0.908073	0.086183	42	0.912494	0.166681	5411	-117.880902	36.091986	4.854336
3	2019-07-04 02:00:39.149363202	0.814322	0.036164	15	0.209530	0.092534	17868	-117.866468	36.093520	4.981447
4	2019-07-04 03:05:31.018885833	0.799281	0.080708	11	0.104050	0.156833	25037	-117.846320	36.100386	5.943363
5	2019-07-04 03:20:28.674438914	0.755451	0.050436	44	0.869927	0.111885	6929	-117.673684	36.114308	5.894466
6	2019-07-04 04:03:01.619369274	0.897311	0.143631	32	0.386310	0.214381	7607	-117.805077	36.016063	0.396071
7	2019-07-04 05:16:47.223353119	0.835094	0.053522	22	0.575433	0.163675	13098	-117.879256	36.090409	4.268292
8	2019-07-04 06:57:32.758991812	0.922386	0.065846	23	0.256281	0.065212	15698	-117.867587	36.081665	5.939931
9	2019-07-04 11:51:07.805591259	0.692042	0.075023	48	0.818316	0.102087	4380	-117.671909	36.118573	6.085804
10	2019-07-04 15:36:04.228420696	0.652888	0.012137	9	-0.633904	0.167058	27370	-117.785299	36.005578	3.234981
11	2019-07-04 15:42:47.932558745	0.597856	0.072740	28	0.933088	0.151623	9740	-117.501116	35.707382	14.496446
12	2019-07-04 16:07:20.003321194	0.729246	0.093399	33	0.835854	0.133106	9365	-117.491503	35.711241	13.846898
13	2019-07-04 16:11:46.920083440	0.711579	0.516367	18	1.734816	0.117952	7098	-117.877675	35.196445	31.000000
14	2019-07-04 16:13:43.094792540	0.849673	0.070048	83	1.653859	0.111108	2305	-117.493716	35.710090	13.375377
15	2019-07-04 16:16:07.085486307	0.814365	0.040161	10	0.587583	0.099619	26520	-117.540198	35.689022	14.537368
16	2019-07-04 17:02:55.057058245	0.770696	0.079675	90	4.476724	0.143489	1101	-117.495094	35.711607	13.586411
17	2019-07-04 17:04:02.231614981	0.664676	0.064038	41	2.062226	0.186214	5379	-117.488446	35.711139	14.001705
18	2019-07-04 17:05:05.071421677	0.509088	0.089882	29	1.471434	0.157048	12115	-117.491307	35.710881	13.223374
19	2019-07-04 17:08:51.664841725	0.666790	0.046444	15	0.657653	0.175270	23044	-117.504506	35.706123	14.570921

In [9]:

Copied!

# feel free to play with the event index to plot different events
EVENT_IDX = 1101

event_meta.set_index("event_index").loc[EVENT_IDX]
# feel free to play with the event index to plot different events
EVENT_IDX = 1101

event_meta.set_index("event_index").loc[EVENT_IDX]

Out[9]:

time                        2019-07-04 17:02:55.057058245
adloc_score                                      0.770696
adloc_residual_time                              0.079675
num_picks                                              90
magnitude                                        4.476724
adloc_residual_amplitude                         0.143489
longitude                                     -117.495094
latitude                                        35.711607
depth_km                                        13.586411
Name: 1101, dtype: object

In [10]:

Copied!

# fetch the corresponding picks for this event
event_picks = picks[picks["event_index"] == EVENT_IDX]
event_picks
# fetch the corresponding picks for this event
event_picks = picks[picks["event_index"] == EVENT_IDX]
event_picks

Out[10]:

	station_id	phase_index	phase_time	phase_score	phase_type	dt_s	phase_polarity	phase_amplitude	event_index	sp_ratio	event_amplitude	adloc_mask	adloc_residual_time	adloc_residual_amplitude
565	CI.CLC..HN	6137848	2019-07-04 17:02:58.488	0.969	P	0.01	-0.879	-0.511873	1101	1.017569	2.474507e-07	1	-0.010048	-0.054426
566	CI.CLC..BH	6137847	2019-07-04 17:02:58.489	0.633	P	0.01	-0.324	-0.666956	1101	1.014822	2.022085e-07	1	0.048153	-0.127168
567	CI.CLC..HH	6137849	2019-07-04 17:02:58.498	0.965	P	0.01	-0.883	-0.586030	1101	1.086886	2.141392e-07	1	-0.007135	-0.236796
568	CI.SRT..HN	6137990	2019-07-04 17:02:59.908	0.879	P	0.01	0.699	-0.643401	1101	1.377537	4.387864e-07	1	-0.069463	0.065039
569	CI.SRT..HH	6137990	2019-07-04 17:02:59.908	0.891	P	0.01	0.734	-0.601019	1101	0.729745	5.910421e-07	1	-0.078144	0.020251
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
650	CI.WMF..HH	6139185	2019-07-04 17:03:11.858	0.672	S	0.01	0.021	-1.368759	1101	1.096820	3.073536e-07	1	0.140179	-0.329117
651	CI.WMF..HN	6139189	2019-07-04 17:03:11.898	0.633	S	0.01	0.026	-1.371407	1101	0.989939	2.891483e-07	1	0.159858	-0.171350
652	CI.DTP..BH	6139200	2019-07-04 17:03:12.019	0.734	S	0.01	0.035	-1.322484	1101	1.385395	1.624267e-07	1	0.068940	0.175198
653	CI.DTP..HH	6139203	2019-07-04 17:03:12.038	0.707	S	0.01	-0.015	-1.306977	1101	1.122809	3.205840e-07	1	0.052824	0.125587
654	CI.DTP..HN	6139211	2019-07-04 17:03:12.118	0.625	S	0.01	0.043	-1.282829	1101	0.863380	2.702479e-07	1	0.105026	0.211253

90 rows × 14 columns

In [11]:

Copied!

# fetch the waveforms
event_waveforms = fetch_event_waveforms(event_picks, time_before_phase_onset_sec=10., duration_sec=30.)
print(event_waveforms.__str__(extended=True))
# fetch the waveforms
event_waveforms = fetch_event_waveforms(event_picks, time_before_phase_onset_sec=10., duration_sec=30.)
print(event_waveforms.__str__(extended=True))

42 Trace(s) in Stream:
CI.CLC..HHZ  | 2019-07-04T17:02:48.478300Z - 2019-07-04T17:03:18.478300Z | 25.0 Hz, 751 samples
CI.SRT..HHZ  | 2019-07-04T17:02:49.918300Z - 2019-07-04T17:03:19.918300Z | 25.0 Hz, 751 samples
CI.CCC..HHZ  | 2019-07-04T17:02:50.118300Z - 2019-07-04T17:03:20.118300Z | 25.0 Hz, 751 samples
CI.SLA..HHZ  | 2019-07-04T17:02:50.518300Z - 2019-07-04T17:03:20.518300Z | 25.0 Hz, 751 samples
CI.TOW2..HHZ | 2019-07-04T17:02:50.518300Z - 2019-07-04T17:03:20.518300Z | 25.0 Hz, 751 samples
CI.LRL..HHZ  | 2019-07-04T17:02:50.638300Z - 2019-07-04T17:03:20.638300Z | 25.0 Hz, 751 samples
CI.WRC2..HHZ | 2019-07-04T17:02:50.718300Z - 2019-07-04T17:03:20.718300Z | 25.0 Hz, 751 samples
CI.CLC..HHN  | 2019-07-04T17:02:50.958300Z - 2019-07-04T17:03:20.958300Z | 25.0 Hz, 751 samples
CI.CLC..HHE  | 2019-07-04T17:02:50.958300Z - 2019-07-04T17:03:20.958300Z | 25.0 Hz, 751 samples
CI.MPM..HHZ  | 2019-07-04T17:02:52.038300Z - 2019-07-04T17:03:22.038300Z | 25.0 Hz, 751 samples
CI.WBM..HHZ  | 2019-07-04T17:02:52.123100Z - 2019-07-04T17:03:22.123100Z | 25.0 Hz, 751 samples
CI.WVP2..EHZ | 2019-07-04T17:02:52.160000Z - 2019-07-04T17:03:22.160000Z | 25.0 Hz, 751 samples
CI.WNM..EHZ  | 2019-07-04T17:02:52.160000Z - 2019-07-04T17:03:22.160000Z | 25.0 Hz, 751 samples
CI.JRC2..HHZ | 2019-07-04T17:02:52.558300Z - 2019-07-04T17:03:22.558300Z | 25.0 Hz, 751 samples
CI.WCS2..HHZ | 2019-07-04T17:02:52.598300Z - 2019-07-04T17:03:22.598300Z | 25.0 Hz, 751 samples
CI.WRV2..EHZ | 2019-07-04T17:02:53.600000Z - 2019-07-04T17:03:23.600000Z | 25.0 Hz, 751 samples
CI.SRT..HHN  | 2019-07-04T17:02:53.838300Z - 2019-07-04T17:03:23.838300Z | 25.0 Hz, 751 samples
CI.SRT..HHE  | 2019-07-04T17:02:53.838300Z - 2019-07-04T17:03:23.838300Z | 25.0 Hz, 751 samples
CI.CCC..HHN  | 2019-07-04T17:02:54.078300Z - 2019-07-04T17:03:24.078300Z | 25.0 Hz, 751 samples
CI.CCC..HHE  | 2019-07-04T17:02:54.078300Z - 2019-07-04T17:03:24.078300Z | 25.0 Hz, 751 samples
CI.SLA..HHE  | 2019-07-04T17:02:54.638300Z - 2019-07-04T17:03:24.638300Z | 25.0 Hz, 751 samples
CI.SLA..HHN  | 2019-07-04T17:02:54.638300Z - 2019-07-04T17:03:24.638300Z | 25.0 Hz, 751 samples
CI.LRL..HHN  | 2019-07-04T17:02:54.718300Z - 2019-07-04T17:03:24.718300Z | 25.0 Hz, 751 samples
CI.LRL..HHE  | 2019-07-04T17:02:54.718300Z - 2019-07-04T17:03:24.718300Z | 25.0 Hz, 751 samples
CI.WMF..HHZ  | 2019-07-04T17:02:54.798300Z - 2019-07-04T17:03:24.798300Z | 25.0 Hz, 751 samples
CI.DTP..HHZ  | 2019-07-04T17:02:54.958300Z - 2019-07-04T17:03:24.958300Z | 25.0 Hz, 751 samples
CI.TOW2..HHN | 2019-07-04T17:02:54.998300Z - 2019-07-04T17:03:24.998300Z | 25.0 Hz, 751 samples
CI.TOW2..HHE | 2019-07-04T17:02:54.998300Z - 2019-07-04T17:03:24.998300Z | 25.0 Hz, 751 samples
CI.WRC2..HHN | 2019-07-04T17:02:54.998300Z - 2019-07-04T17:03:24.998300Z | 25.0 Hz, 751 samples
CI.WRC2..HHE | 2019-07-04T17:02:54.998300Z - 2019-07-04T17:03:24.998300Z | 25.0 Hz, 751 samples
CI.WBM..HHE  | 2019-07-04T17:02:57.243100Z - 2019-07-04T17:03:27.243100Z | 25.0 Hz, 751 samples
CI.WBM..HHN  | 2019-07-04T17:02:57.243100Z - 2019-07-04T17:03:27.243100Z | 25.0 Hz, 751 samples
CI.MPM..HHE  | 2019-07-04T17:02:57.318300Z - 2019-07-04T17:03:27.318300Z | 25.0 Hz, 751 samples
CI.MPM..HHN  | 2019-07-04T17:02:57.318300Z - 2019-07-04T17:03:27.318300Z | 25.0 Hz, 751 samples
CI.JRC2..HHN | 2019-07-04T17:02:57.958300Z - 2019-07-04T17:03:27.958300Z | 25.0 Hz, 751 samples
CI.JRC2..HHE | 2019-07-04T17:02:57.958300Z - 2019-07-04T17:03:27.958300Z | 25.0 Hz, 751 samples
CI.WCS2..HHE | 2019-07-04T17:02:58.118300Z - 2019-07-04T17:03:28.118300Z | 25.0 Hz, 751 samples
CI.WCS2..HHN | 2019-07-04T17:02:58.118300Z - 2019-07-04T17:03:28.118300Z | 25.0 Hz, 751 samples
CI.WMF..HHN  | 2019-07-04T17:03:01.838300Z - 2019-07-04T17:03:31.838300Z | 25.0 Hz, 751 samples
CI.WMF..HHE  | 2019-07-04T17:03:01.838300Z - 2019-07-04T17:03:31.838300Z | 25.0 Hz, 751 samples
CI.DTP..HHN  | 2019-07-04T17:03:02.038300Z - 2019-07-04T17:03:32.038300Z | 25.0 Hz, 751 samples
CI.DTP..HHE  | 2019-07-04T17:03:02.038300Z - 2019-07-04T17:03:32.038300Z | 25.0 Hz, 751 samples

In [12]:

Copied!

# plot them!
fig = event_waveforms.select(component="Z").plot(equal_scale=False)
# plot them!
fig = event_waveforms.select(component="Z").plot(equal_scale=False)

Run template matching¶

We will now use one of the events from the PhaseNet catalog as a template event to detect events with template matching.

In [13]:

Copied!

selected_event_meta = event_meta.set_index("event_index").loc[EVENT_IDX]
selected_event_meta
selected_event_meta = event_meta.set_index("event_index").loc[EVENT_IDX]
selected_event_meta

Out[13]:

time                        2019-07-04 17:02:55.057058245
adloc_score                                      0.770696
adloc_residual_time                              0.079675
num_picks                                              90
magnitude                                        4.476724
adloc_residual_amplitude                         0.143489
longitude                                     -117.495094
latitude                                        35.711607
depth_km                                        13.586411
Name: 1101, dtype: object

In [14]:

Copied!

selected_event_picks = picks[picks["event_index"] == EVENT_IDX]
selected_event_picks
selected_event_picks = picks[picks["event_index"] == EVENT_IDX]
selected_event_picks

Out[14]:

	station_id	phase_index	phase_time	phase_score	phase_type	dt_s	phase_polarity	phase_amplitude	event_index	sp_ratio	event_amplitude	adloc_mask	adloc_residual_time	adloc_residual_amplitude
565	CI.CLC..HN	6137848	2019-07-04 17:02:58.488	0.969	P	0.01	-0.879	-0.511873	1101	1.017569	2.474507e-07	1	-0.010048	-0.054426
566	CI.CLC..BH	6137847	2019-07-04 17:02:58.489	0.633	P	0.01	-0.324	-0.666956	1101	1.014822	2.022085e-07	1	0.048153	-0.127168
567	CI.CLC..HH	6137849	2019-07-04 17:02:58.498	0.965	P	0.01	-0.883	-0.586030	1101	1.086886	2.141392e-07	1	-0.007135	-0.236796
568	CI.SRT..HN	6137990	2019-07-04 17:02:59.908	0.879	P	0.01	0.699	-0.643401	1101	1.377537	4.387864e-07	1	-0.069463	0.065039
569	CI.SRT..HH	6137990	2019-07-04 17:02:59.908	0.891	P	0.01	0.734	-0.601019	1101	0.729745	5.910421e-07	1	-0.078144	0.020251
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
650	CI.WMF..HH	6139185	2019-07-04 17:03:11.858	0.672	S	0.01	0.021	-1.368759	1101	1.096820	3.073536e-07	1	0.140179	-0.329117
651	CI.WMF..HN	6139189	2019-07-04 17:03:11.898	0.633	S	0.01	0.026	-1.371407	1101	0.989939	2.891483e-07	1	0.159858	-0.171350
652	CI.DTP..BH	6139200	2019-07-04 17:03:12.019	0.734	S	0.01	0.035	-1.322484	1101	1.385395	1.624267e-07	1	0.068940	0.175198
653	CI.DTP..HH	6139203	2019-07-04 17:03:12.038	0.707	S	0.01	-0.015	-1.306977	1101	1.122809	3.205840e-07	1	0.052824	0.125587
654	CI.DTP..HN	6139211	2019-07-04 17:03:12.118	0.625	S	0.01	0.043	-1.282829	1101	0.863380	2.702479e-07	1	0.105026	0.211253

90 rows × 14 columns

Read data from same day¶

In [15]:

Copied!





def fetch_day_waveforms(dir_waveforms):
    """
    Fetches the continuous seismograms for a given day.

    Parameters
    ----------
    dir_waveforms : str
        Directory where the waveform data is stored, by default DIR_WAVEFORMS.

    Returns
    -------
    obspy.Stream
        Stream object containing the fetched continuous seismograms.
    """
    stream = obs.Stream()
    files = glob.glob(os.path.join(dir_waveforms, "*mseed"))
    for _file in files:
        stream += obs.read(_file)
    return stream
def fetch_day_waveforms(dir_waveforms):
    """
    Fetches the continuous seismograms for a given day.

    Parameters
    ----------
    dir_waveforms : str
        Directory where the waveform data is stored, by default DIR_WAVEFORMS.

    Returns
    -------
    obspy.Stream
        Stream object containing the fetched continuous seismograms.
    """
    stream = obs.Stream()
    files = glob.glob(os.path.join(dir_waveforms, "*mseed"))
    for _file in files:
        stream += obs.read(_file)
    return stream

In [16]:

Copied!

# first, read the continuous seismograms into an `obspy.Stream`
continuous_seismograms = fetch_day_waveforms(DIR_WAVEFORMS)
print(continuous_seismograms.__str__(extended=True))
# first, read the continuous seismograms into an `obspy.Stream`
continuous_seismograms = fetch_day_waveforms(DIR_WAVEFORMS)
print(continuous_seismograms.__str__(extended=True))

57 Trace(s) in Stream:
CI.WCS2..HHE | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B916..EHZ | 2019-07-03T23:59:59.998200Z - 2019-07-04T23:59:59.958200Z | 25.0 Hz, 2160000 samples
CI.CLC..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WRV2..EHZ | 2019-07-04T00:00:00.000000Z - 2019-07-04T23:59:59.960000Z | 25.0 Hz, 2160000 samples
PB.B916..EH2 | 2019-07-03T23:59:59.998200Z - 2019-07-04T23:59:59.958200Z | 25.0 Hz, 2160000 samples
PB.B917..EH1 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WMF..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B918..EHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B918..EH2 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.TOW2..HHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.SLA..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WRC2..HHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.JRC2..HHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.CCC..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DTP..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.SRT..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.LRL..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.SLA..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.MPM..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DTP..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WBM..HHZ  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples
CI.WCS2..HHN | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.CLC..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B921..EH1 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DAW..HHZ  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples
CI.CLC..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WMF..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.TOW2..HHN | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DAW..HHE  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples
PB.B918..EH1 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WRC2..HHN | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.JRC2..HHN | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.CCC..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B916..EH1 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.MPM..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DTP..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WBM..HHE  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples
CI.SRT..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B917..EHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WVP2..EHZ | 2019-07-04T00:00:00.000000Z - 2019-07-04T23:59:59.960000Z | 25.0 Hz, 2160000 samples
PB.B917..EH2 | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.LRL..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.CCC..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.SLA..HHZ  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WNM..EHZ  | 2019-07-04T00:00:00.000000Z - 2019-07-04T23:59:59.960000Z | 25.0 Hz, 2160000 samples
CI.WRC2..HHE | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.JRC2..HHE | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.SRT..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.LRL..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.MPM..HHN  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
PB.B921..EHZ | 2019-07-03T23:59:59.998200Z - 2019-07-04T23:59:59.958200Z | 25.0 Hz, 2160000 samples
CI.WBM..HHN  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples
PB.B921..EH2 | 2019-07-03T23:59:59.998200Z - 2019-07-04T23:59:59.958200Z | 25.0 Hz, 2160000 samples
CI.WMF..HHE  | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.WCS2..HHZ | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.TOW2..HHE | 2019-07-03T23:59:59.998300Z - 2019-07-04T23:59:59.958300Z | 25.0 Hz, 2160000 samples
CI.DAW..HHN  | 2019-07-04T00:00:00.003100Z - 2019-07-04T23:59:59.963100Z | 25.0 Hz, 2160000 samples

Introduce gaps in some stations¶

In [17]:

Copied!





GAP_START_SEC = 3. * 60. * 60. 
GAP_END_SEC = 14. * 60. * 60.
station_list = list(set([st.stats.station for st in continuous_seismograms]))
STATIONS_W_GAP = np.random.choice(
    station_list, size=int(3. / 4. * len(station_list)), replace=False
)
for sta in STATIONS_W_GAP:
    for tr in continuous_seismograms.select(station=sta):
        gap_start_samp = int(GAP_START_SEC * tr.stats.sampling_rate)
        gap_end_samp = int(GAP_END_SEC * tr.stats.sampling_rate)
        tr.data[gap_start_samp:gap_end_samp] = 0.
GAP_START_SEC = 3. * 60. * 60. 
GAP_END_SEC = 14. * 60. * 60.
station_list = list(set([st.stats.station for st in continuous_seismograms]))
STATIONS_W_GAP = np.random.choice(
    station_list, size=int(3. / 4. * len(station_list)), replace=False
)
for sta in STATIONS_W_GAP:
    for tr in continuous_seismograms.select(station=sta):
        gap_start_samp = int(GAP_START_SEC * tr.stats.sampling_rate)
        gap_end_samp = int(GAP_END_SEC * tr.stats.sampling_rate)
        tr.data[gap_start_samp:gap_end_samp] = 0.

In [18]:

Copied!

# plot the continuous seismograms from a single station
fig = continuous_seismograms.select(station="CLC").plot()
# plot the continuous seismograms from a single station
fig = continuous_seismograms.select(station="CLC").plot()

In [19]:

Copied!





# then, cast data into `numpy.ndarray`
station_codes = list(set([st.stats.station for st in continuous_seismograms]))
component_codes = ["N", "E", "Z"]
component_aliases={"E": ["E", "2"], "N": ["N", "1"], "Z": ["Z"]}

num_stations = len(station_codes)
num_channels = len(component_codes)
num_samples = len(continuous_seismograms[0].data)

continuous_seismograms_arr = np.zeros((num_stations, num_channels, num_samples), dtype=np.float32)
for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        for cp_alias in component_aliases[cp]:
            sel_seismogram = continuous_seismograms.select(station=sta, component=cp_alias)
            if len(sel_seismogram) > 0:
                continuous_seismograms_arr[s, c, :] = sel_seismogram[0].data
                break
            
continuous_seismograms_arr
# then, cast data into `numpy.ndarray`
station_codes = list(set([st.stats.station for st in continuous_seismograms]))
component_codes = ["N", "E", "Z"]
component_aliases={"E": ["E", "2"], "N": ["N", "1"], "Z": ["Z"]}

num_stations = len(station_codes)
num_channels = len(component_codes)
num_samples = len(continuous_seismograms[0].data)

continuous_seismograms_arr = np.zeros((num_stations, num_channels, num_samples), dtype=np.float32)
for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        for cp_alias in component_aliases[cp]:
            sel_seismogram = continuous_seismograms.select(station=sta, component=cp_alias)
            if len(sel_seismogram) > 0:
                continuous_seismograms_arr[s, c, :] = sel_seismogram[0].data
                break
            
continuous_seismograms_arr

Out[19]:

array([[[-9.8226795e-11,  5.7554624e-11,  3.4072033e-11, ...,
         -1.8828223e-09,  8.1136459e-10, -1.7122942e-10],
        [-1.7923078e-10,  9.7317022e-11,  3.3442499e-10, ...,
          7.6900075e-10,  2.7582496e-09, -1.1529546e-09],
        [-1.6065722e-10, -2.5988860e-11, -8.8432366e-11, ...,
          6.2743988e-10,  3.0330730e-10, -1.6875973e-10]],

       [[ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
          0.0000000e+00,  0.0000000e+00,  0.0000000e+00],
        [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
          0.0000000e+00,  0.0000000e+00,  0.0000000e+00],
        [ 1.8492177e-11,  7.3153331e-11,  1.3191939e-10, ...,
          4.4108817e-10, -3.0806005e-10, -4.0270010e-10]],

       [[-1.6893502e-11,  8.4201196e-12,  2.7799072e-11, ...,
         -1.7474208e-10, -4.5014548e-11, -2.6048433e-10],
        [ 1.4196522e-11,  3.8935088e-12, -2.4043532e-12, ...,
         -4.9603099e-11, -9.2666923e-11,  2.2229388e-10],
        [-7.8133723e-12,  1.1610854e-11,  1.2903780e-11, ...,
         -8.2966682e-11, -9.3903114e-11, -8.1583212e-12]],

       ...,

       [[ 5.4725401e-11,  7.0578925e-11, -6.2566233e-12, ...,
         -3.6589140e-10, -7.8155921e-10, -6.3879835e-10],
        [-1.3925512e-11,  2.0290504e-11,  1.0709553e-11, ...,
          1.0249724e-09,  7.7783147e-10, -4.5876902e-10],
        [ 1.9426023e-10, -3.3486833e-10, -1.7510682e-10, ...,
          9.0171864e-10, -9.0758560e-11, -1.7541314e-10]],

       [[-6.0105490e-11, -1.9033368e-09, -2.2194966e-09, ...,
          9.9305908e-10, -2.5555899e-10, -1.2649487e-09],
        [ 3.3998053e-09, -9.0392399e-10, -4.9158380e-09, ...,
         -5.5282345e-10, -3.0974114e-09, -1.6373071e-09],
        [-6.3921446e-10, -1.2163219e-09, -1.2187444e-09, ...,
         -1.0883094e-09,  2.2945404e-10,  4.6940152e-10]],

       [[ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
          0.0000000e+00,  0.0000000e+00,  0.0000000e+00],
        [ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
          0.0000000e+00,  0.0000000e+00,  0.0000000e+00],
        [-2.3801865e-11,  4.4190979e-12, -4.3637774e-11, ...,
         -4.7749610e-11, -1.3966529e-10, -1.5340888e-10]]], dtype=float32)

Build template¶

------------------ Background ------------------¶

A template is a collection of waveforms at different channels, $T_{s,c}(t)$, which are clips taken from the continuous seismograms, $u_{s,c}$. These clips are taken at times defined by: $$ u_{s,c}(t)\ |\ t \in \lbrace \tau_{s,c}; \tau_{s,c} + D \rbrace, $$ where $\tau_{s,c}$ is the start time of the template window and $D$ is the template duration.

$\tau_{s,c}$ is given by some prior information on the event: picks or modeled arrival times. The moveouts, $\tilde{\tau}_{s,c}$, are the collection of delay times relative to the earliest $\tau_{s,c}$: $$ \tilde{\tau}_{s,c} = \tau_{s,c} - \underset{s,c}{\min} \lbrace \tau_{s,c} \rbrace .$$

------------------ On the necessity to clip template waveforms out of `numpy.ndarray` instead of `obspy.Stream` ------------------¶

Looking carefully at the output of print(continuous_seismograms.__str__(extended=True)), a few cells before, we see that start times are generally not exactly at midnight. This is a consequence of the discrete nature of the continuous seismograms (here, sampled at 25 samples per second). Thus, in general, the $\tau_{s,c}$ computed from picks or modeled arrival times fall in between two samples of the seismograms.

When running a matched-filter search, we need to make sure the moveouts, $\tilde{\tau}_{s,c}$, ultimately expressed in samples, match exactly the times that were used when clipping the template waveforms out of $u_{s,c}$. One way to ensure this is to first cast the $\tau_{s,c}$ to times in samples and then operate exclusively on the numpy.ndarray: continuous_seismograms_arr:

$$ T_{s,c}[t_n] = u_{s,c}[\tau_{s,c} + n \Delta t],$$ where $\Delta t$ is the sampling time.

Clip out waveforms and moveout and station-weight arrays¶

In [20]:

Copied!





# PHASE_ON_COMP: dictionary defining which moveout we use to extract the waveform.
#                Here, we use windows centered around the S wave for horizontal components
#                and windows starting 1sec before the P wave for the vertical component.
PHASE_ON_COMP = {"N": "S", "E": "S", "Z": "P"}
# OFFSET_PHASE_SEC: dictionary defining the time offset taken before a given phase
#               for example OFFSET_PHASE_SEC["P"] = 1.0 means that we extract the window
#               1 second before the predicted P arrival time
OFFSET_PHASE_SEC = {"P": 1.0, "S": 4.0}
# TEMPLATE_DURATION_SEC
TEMPLATE_DURATION_SEC = 8. 
# SAMPLING_RATE_HZ
SAMPLING_RATE_HZ = 25.
# TEMPLATE_DURATION_SAMP
TEMPLATE_DURATION_SAMP = int(TEMPLATE_DURATION_SEC * SAMPLING_RATE_HZ)
# PHASE_ON_COMP: dictionary defining which moveout we use to extract the waveform.
#                Here, we use windows centered around the S wave for horizontal components
#                and windows starting 1sec before the P wave for the vertical component.
PHASE_ON_COMP = {"N": "S", "E": "S", "Z": "P"}
# OFFSET_PHASE_SEC: dictionary defining the time offset taken before a given phase
#               for example OFFSET_PHASE_SEC["P"] = 1.0 means that we extract the window
#               1 second before the predicted P arrival time
OFFSET_PHASE_SEC = {"P": 1.0, "S": 4.0}
# TEMPLATE_DURATION_SEC
TEMPLATE_DURATION_SEC = 8. 
# SAMPLING_RATE_HZ
SAMPLING_RATE_HZ = 25.
# TEMPLATE_DURATION_SAMP
TEMPLATE_DURATION_SAMP = int(TEMPLATE_DURATION_SEC * SAMPLING_RATE_HZ)

In [21]:

Copied!





# add station_code columns to `selected_event_picks`
selected_event_picks.set_index("station_id", inplace=True)
for staid in selected_event_picks.index:
    station_code = staid.split(".")[1]
    selected_event_picks.loc[staid, "station_code"] = station_code
selected_event_picks
# add station_code columns to `selected_event_picks`
selected_event_picks.set_index("station_id", inplace=True)
for staid in selected_event_picks.index:
    station_code = staid.split(".")[1]
    selected_event_picks.loc[staid, "station_code"] = station_code
selected_event_picks

/tmp/ipykernel_2902838/1881751651.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_event_picks.loc[staid, "station_code"] = station_code

Out[21]:

	phase_index	phase_time	phase_score	phase_type	dt_s	phase_polarity	phase_amplitude	event_index	sp_ratio	event_amplitude	adloc_mask	adloc_residual_time	adloc_residual_amplitude	station_code
station_id
CI.CLC..HN	6137848	2019-07-04 17:02:58.488	0.969	P	0.01	-0.879	-0.511873	1101	1.017569	2.474507e-07	1	-0.010048	-0.054426	CLC
CI.CLC..BH	6137847	2019-07-04 17:02:58.489	0.633	P	0.01	-0.324	-0.666956	1101	1.014822	2.022085e-07	1	0.048153	-0.127168	CLC
CI.CLC..HH	6137849	2019-07-04 17:02:58.498	0.965	P	0.01	-0.883	-0.586030	1101	1.086886	2.141392e-07	1	-0.007135	-0.236796	CLC
CI.SRT..HN	6137990	2019-07-04 17:02:59.908	0.879	P	0.01	0.699	-0.643401	1101	1.377537	4.387864e-07	1	-0.069463	0.065039	SRT
CI.SRT..HH	6137990	2019-07-04 17:02:59.908	0.891	P	0.01	0.734	-0.601019	1101	0.729745	5.910421e-07	1	-0.078144	0.020251	SRT
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
CI.WMF..HH	6139185	2019-07-04 17:03:11.858	0.672	S	0.01	0.021	-1.368759	1101	1.096820	3.073536e-07	1	0.140179	-0.329117	WMF
CI.WMF..HN	6139189	2019-07-04 17:03:11.898	0.633	S	0.01	0.026	-1.371407	1101	0.989939	2.891483e-07	1	0.159858	-0.171350	WMF
CI.DTP..BH	6139200	2019-07-04 17:03:12.019	0.734	S	0.01	0.035	-1.322484	1101	1.385395	1.624267e-07	1	0.068940	0.175198	DTP
CI.DTP..HH	6139203	2019-07-04 17:03:12.038	0.707	S	0.01	-0.015	-1.306977	1101	1.122809	3.205840e-07	1	0.052824	0.125587	DTP
CI.DTP..HN	6139211	2019-07-04 17:03:12.118	0.625	S	0.01	0.043	-1.282829	1101	0.863380	2.702479e-07	1	0.105026	0.211253	DTP

90 rows × 14 columns

In the following cell, we build the numpy.ndarray of moveouts $\tilde{\tau}_{s,c}$, expressed in units of samples.

In [22]:

Copied!





# first, we extract the set of relative delay times of the beginning of each
# template window on a given station and component
tau_s_c_sec = np.zeros((num_stations, num_channels), dtype=np.float64)
for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        phase_type = PHASE_ON_COMP[cp]
        picks_s_c = selected_event_picks[
            (
                (selected_event_picks["station_code"] == sta)
                & (selected_event_picks["phase_type"] == phase_type)
            )
        ]
        if len(picks_s_c) == 0:
            # no pick for this station/component: set to -999
            tau_s_c_sec[s, c] = -999
        elif len(picks_s_c) == 1:
            # express pick relative to beginning of day (midnight)
            _pick = pd.Timestamp(picks_s_c["phase_time"])
            _relative_pick_sec = (_pick - pd.Timestamp(_pick.strftime("%Y-%m-%d"))).total_seconds()
            tau_s_c_sec[s, c] = _relative_pick_sec - OFFSET_PHASE_SEC[phase_type]
        else:
            # there were several picks from different channels: average them
            _relative_pick_sec = 0.
            for _pick in picks_s_c["phase_time"].values:
                _pick = pd.Timestamp(_pick)
                _relative_pick_sec += (_pick - pd.Timestamp(_pick.strftime("%Y-%m-%d"))).total_seconds()
            _relative_pick_sec /= float(len(picks_s_c["phase_time"]))
            tau_s_c_sec[s, c] = _relative_pick_sec - OFFSET_PHASE_SEC[phase_type]
# now, we convert these relative times into samples 
# and express them relative to the earliest time
# we also store in memory the minimum time offset `tau_min_samp` for the next step
moveouts_samp_arr = (tau_s_c_sec * SAMPLING_RATE_HZ).astype(np.int64)
tau_min_samp = np.min(moveouts_samp_arr[moveouts_samp_arr > 0])
moveouts_samp_arr = moveouts_samp_arr - tau_min_samp
moveouts_samp_arr
# first, we extract the set of relative delay times of the beginning of each
# template window on a given station and component
tau_s_c_sec = np.zeros((num_stations, num_channels), dtype=np.float64)
for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        phase_type = PHASE_ON_COMP[cp]
        picks_s_c = selected_event_picks[
            (
                (selected_event_picks["station_code"] == sta)
                & (selected_event_picks["phase_type"] == phase_type)
            )
        ]
        if len(picks_s_c) == 0:
            # no pick for this station/component: set to -999
            tau_s_c_sec[s, c] = -999
        elif len(picks_s_c) == 1:
            # express pick relative to beginning of day (midnight)
            _pick = pd.Timestamp(picks_s_c["phase_time"])
            _relative_pick_sec = (_pick - pd.Timestamp(_pick.strftime("%Y-%m-%d"))).total_seconds()
            tau_s_c_sec[s, c] = _relative_pick_sec - OFFSET_PHASE_SEC[phase_type]
        else:
            # there were several picks from different channels: average them
            _relative_pick_sec = 0.
            for _pick in picks_s_c["phase_time"].values:
                _pick = pd.Timestamp(_pick)
                _relative_pick_sec += (_pick - pd.Timestamp(_pick.strftime("%Y-%m-%d"))).total_seconds()
            _relative_pick_sec /= float(len(picks_s_c["phase_time"]))
            tau_s_c_sec[s, c] = _relative_pick_sec - OFFSET_PHASE_SEC[phase_type]
# now, we convert these relative times into samples 
# and express them relative to the earliest time
# we also store in memory the minimum time offset `tau_min_samp` for the next step
moveouts_samp_arr = (tau_s_c_sec * SAMPLING_RATE_HZ).astype(np.int64)
tau_min_samp = np.min(moveouts_samp_arr[moveouts_samp_arr > 0])
moveouts_samp_arr = moveouts_samp_arr - tau_min_samp
moveouts_samp_arr

Out[22]:

array([[      94,       94,       66],
       [     160,      160,      105],
       [-1559399, -1559399, -1559399],
       [     159,      159,      102],
       [     272,      272,      171],
       [     179,      179,      116],
       [     101,      101,       68],
       [-1559399, -1559399, -1559399],
       [      78,       78,       53],
       [     277,      277,      174],
       [     225,      225,      140],
       [       0,        0,       13],
       [-1559399, -1559399, -1559399],
       [     102,      102,       64],
       [-1559399, -1559399, -1559399],
       [      72,       72,       48],
       [-1559399, -1559399, -1559399],
       [     175,      175,      115],
       [     156,      156,      104],
       [      91,       91,       62],
       [     165,      165,      105]])

Next, we use the moveouts, in samples, to clip out the relevant template waveforms from the continuous seismograms.

In [23]:

Copied!





template_waveforms_arr = np.zeros((num_stations, num_channels, TEMPLATE_DURATION_SAMP), dtype=np.float32)
weights_arr = np.ones((num_stations, num_channels), dtype=np.float32)

for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        if moveouts_samp_arr[s, c] < 0:
            # no picks were found on this station
            weights_arr[s, c] = 0.
            continue
        starttime = tau_min_samp + moveouts_samp_arr[s, c]
        endtime = starttime + TEMPLATE_DURATION_SAMP
        template_waveforms_arr[s, c, :] = continuous_seismograms_arr[s, c, starttime:endtime]
        if template_waveforms_arr[s, c, :].sum() == 0.:
            # no data was available on this channel
            weights_arr[s, c] = 0.
        
template_waveforms_arr
template_waveforms_arr = np.zeros((num_stations, num_channels, TEMPLATE_DURATION_SAMP), dtype=np.float32)
weights_arr = np.ones((num_stations, num_channels), dtype=np.float32)

for s, sta in enumerate(station_codes):
    for c, cp in enumerate(component_codes):
        if moveouts_samp_arr[s, c] < 0:
            # no picks were found on this station
            weights_arr[s, c] = 0.
            continue
        starttime = tau_min_samp + moveouts_samp_arr[s, c]
        endtime = starttime + TEMPLATE_DURATION_SAMP
        template_waveforms_arr[s, c, :] = continuous_seismograms_arr[s, c, starttime:endtime]
        if template_waveforms_arr[s, c, :].sum() == 0.:
            # no data was available on this channel
            weights_arr[s, c] = 0.
        
template_waveforms_arr

Out[23]:

array([[[-7.95800133e-06,  8.55314720e-05,  2.11978477e-04, ...,
         -3.52683623e-04,  9.79480028e-05,  1.74769040e-04],
        [-1.22021929e-05,  4.51964515e-05,  1.12187183e-04, ...,
         -4.29799955e-04, -1.41114127e-04,  4.94585664e-04],
        [ 2.51332494e-07,  1.18968394e-07, -9.30824982e-08, ...,
         -8.89477014e-05,  8.27156691e-05,  1.22664089e-04]],

       [[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 1.27826226e-07,  1.54385773e-07,  1.52002073e-07, ...,
          1.35540016e-04, -3.16935242e-04, -1.91862506e-04]],

       [[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00]],

       ...,

       [[-1.15102615e-04, -2.08503159e-04, -8.57295672e-05, ...,
         -3.02342505e-05,  1.90505059e-04,  2.32697013e-04],
        [-6.95939161e-06, -1.14591476e-05, -4.32817069e-05, ...,
          1.95618981e-04, -6.08503397e-05, -1.64061450e-04],
        [-3.15401110e-08,  6.19310697e-07,  1.24642884e-06, ...,
         -6.75416959e-05,  9.95227219e-06,  9.39235106e-05]],

       [[ 1.31645243e-06,  1.41594983e-05,  1.18550970e-05, ...,
         -1.61218868e-05,  1.20281387e-04,  7.25206701e-05],
        [ 1.01139749e-05,  3.06667994e-06,  6.11646101e-07, ...,
          5.91952994e-04,  5.78140549e-04,  2.96404498e-04],
        [ 5.31832107e-08,  8.85903688e-08,  1.39000178e-07, ...,
          3.86358661e-05,  4.55262576e-04,  2.83642818e-04]],

       [[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
        [ 6.11326456e-08,  1.84645728e-07,  2.68150814e-07, ...,
         -1.40672455e-05,  9.12318046e-06,  1.10272435e-06]]],
      dtype=float32)

In [24]:

Copied!





# normalize template waveforms for numerical reasons
norm = np.std(template_waveforms_arr, axis=-1, keepdims=True)
norm[norm == 0.] = 1. 
template_waveforms_arr /= norm

# normalize weights so that they sum up to one
weights_arr /= np.sum(weights_arr)

# normalize continuous seismograms for numerical reasons
norm = np.std(continuous_seismograms_arr, axis=-1, keepdims=True)
norm[norm == 0.] = 1. 
continuous_seismograms_arr /= norm
# normalize template waveforms for numerical reasons
norm = np.std(template_waveforms_arr, axis=-1, keepdims=True)
norm[norm == 0.] = 1. 
template_waveforms_arr /= norm

# normalize weights so that they sum up to one
weights_arr /= np.sum(weights_arr)

# normalize continuous seismograms for numerical reasons
norm = np.std(continuous_seismograms_arr, axis=-1, keepdims=True)
norm[norm == 0.] = 1. 
continuous_seismograms_arr /= norm

Run FMF¶

After all this data formatting, we can now run template matching (also called matched-filtering) to detect new events that are similar to our template event.

For that, use the software Fast Matched Filter (FMF): https://github.com/beridel/fast_matched_filter

FMF offers C and CUDA-C routines to efficiently run template matching on CPUs, or even on GPUs if available to you.

In [25]:

Copied!





# FMF_STEP_SAMP: this is the step between two consecutive calculation of the correlation coefficient
FMF_STEP_SAMP = 1
# ARCH: it determines whether you want to use GPUs or CPUs 
#       If you do not have an Nvidia GPU, set ARCH = "cpu"
ARCH = "gpu"
# FMF_STEP_SAMP: this is the step between two consecutive calculation of the correlation coefficient
FMF_STEP_SAMP = 1
# ARCH: it determines whether you want to use GPUs or CPUs 
#       If you do not have an Nvidia GPU, set ARCH = "cpu"
ARCH = "gpu"

The following cell computes the time series of correlation coefficients between the template waveforms, $T_{s,c}$, and the continuous seismograms, $u_{s,c}$: $$ CC(t) = \sum_{s,c} w_{s,c} \sum_{i=1}^N \dfrac{T^*_{s,c}(n \Delta t) u^*_{s,c}(t + \tilde{\tau}_{s,c} + n \Delta t)}{\sqrt{\sum_{i=1}^N {T^*_{s,c}}^2(n \Delta t) \sum_{i=1}^N {u^*_{s,c}}^2(t + \tilde{\tau}_{s,c} + n \Delta t)}}, $$ with:

$T^*_{s,c} = T_{s,c} - \frac{1}{N} \sum_{i=1}^N T_{s,c}(n \Delta t)$,
$u^*_{s,c}(t) = u_{s,c}(t) - \frac{1}{N} \sum_{i=1}^N u_{s,c}(t + n \Delta t)$.

Note that because the seismograms were filtered below periods that are shorter than the template window ($N \Delta t$) we have $T^*_{s,c} \approx T_{s,c}$ and $u^*_{s,c} \approx u_{s,c}$, which spares us the computation of the mean in each sliding window.

In [26]:

Copied!





cc = fmf.matched_filter(
    template_waveforms_arr.astype(np.float32),
    moveouts_samp_arr.astype(np.int32),
    weights_arr.astype(np.float32),
    continuous_seismograms_arr.astype(np.float32),
    FMF_STEP_SAMP,
    arch=ARCH,
)
cc = fmf.matched_filter(
    template_waveforms_arr.astype(np.float32),
    moveouts_samp_arr.astype(np.int32),
    weights_arr.astype(np.float32),
    continuous_seismograms_arr.astype(np.float32),
    FMF_STEP_SAMP,
    arch=ARCH,
)

In [27]:

Copied!

# FMF is programmed to handle multiple templates at once. Here, we only used
# a single template, hence the size of the outermost axis of "1"
cc.shape
# FMF is programmed to handle multiple templates at once. Here, we only used
# a single template, hence the size of the outermost axis of "1"
cc.shape

Out[27]:

(1, 2159801)

In [28]:

Copied!





# let's print the output of our template matching run, which a time series of network-averaged correlation coefficients
# of same duration as the continuous seismograms
_cc = cc[0, :]
time_cc = np.arange(len(_cc)) / SAMPLING_RATE_HZ

fig = plt.figure("network_averaged_cc", figsize=(20, 6))
gs = fig.add_gridspec(ncols=4)

ax1 = fig.add_subplot(gs[:3])
ax1.plot(time_cc, _cc, lw=0.75)
ax1.set_xlabel("Elapsed time (sec)")
ax1.set_ylabel("Network-averaged cc")
ax1.set_xlim(time_cc.min(), time_cc.max())
ax1.set_title("Time series of template/continuous seismograms similarity")

ax2 = fig.add_subplot(gs[3], sharey=ax1)
_ = ax2.hist(_cc, orientation="horizontal", bins=250, density=True)
ax2.set_xlabel("Normalized count")
ax2.set_title("Histogram")

for ax in [ax1, ax2]:
    ax.grid()
# let's print the output of our template matching run, which a time series of network-averaged correlation coefficients
# of same duration as the continuous seismograms
_cc = cc[0, :]
time_cc = np.arange(len(_cc)) / SAMPLING_RATE_HZ

fig = plt.figure("network_averaged_cc", figsize=(20, 6))
gs = fig.add_gridspec(ncols=4)

ax1 = fig.add_subplot(gs[:3])
ax1.plot(time_cc, _cc, lw=0.75)
ax1.set_xlabel("Elapsed time (sec)")
ax1.set_ylabel("Network-averaged cc")
ax1.set_xlim(time_cc.min(), time_cc.max())
ax1.set_title("Time series of template/continuous seismograms similarity")

ax2 = fig.add_subplot(gs[3], sharey=ax1)
_ = ax2.hist(_cc, orientation="horizontal", bins=250, density=True)
ax2.set_xlabel("Normalized count")
ax2.set_title("Histogram")

for ax in [ax1, ax2]:
    ax.grid()

Set detection threshold and find events¶

We will use the time series of correlation coefficients to build an earthquake catalog. For that, we need to set a detection threshold and define all times above that threshold as triggers caused by near-repeats of the template event.

In [29]:

Copied!





def select_cc_indexes(
    cc_t,
    threshold,
    search_win,
):
    """Select the peaks in the CC time series.

    Parameters
    ------------
    cc_t: (n_corr,) numpy.ndarray
        The CC time series for one template.
    threshold: (n_corr,) numpy.ndarray or scalar
        The detection threshold.
    search_win: scalar int
        The minimum inter-event time, in units of correlation step.


    Returns
    --------
    cc_idx: (n_detections,) numpy.ndarray
        The list of all selected CC indexes. They give the timings of the
        detected events.
    """

    cc_detections = cc_t > threshold
    cc_idx = np.where(cc_detections)[0]

    cc_idx = list(cc_idx)
    n_rm = 0
    for i in range(1, len(cc_idx)):
        if (cc_idx[i - n_rm] - cc_idx[i - n_rm - 1]) < search_win:
            if cc_t[cc_idx[i - n_rm]] > cc_t[cc_idx[i - n_rm - 1]]:
                # keep (i-n_rm)-th detection
                cc_idx.remove(cc_idx[i - n_rm - 1])
            else:
                # keep (i-n_rm-1)-th detection
                cc_idx.remove(cc_idx[i - n_rm])
            n_rm += 1
    cc_idx = np.asarray(cc_idx)
    return cc_idx
    
def select_cc_indexes(
    cc_t,
    threshold,
    search_win,
):
    """Select the peaks in the CC time series.

    Parameters
    ------------
    cc_t: (n_corr,) numpy.ndarray
        The CC time series for one template.
    threshold: (n_corr,) numpy.ndarray or scalar
        The detection threshold.
    search_win: scalar int
        The minimum inter-event time, in units of correlation step.


    Returns
    --------
    cc_idx: (n_detections,) numpy.ndarray
        The list of all selected CC indexes. They give the timings of the
        detected events.
    """

    cc_detections = cc_t > threshold
    cc_idx = np.where(cc_detections)[0]

    cc_idx = list(cc_idx)
    n_rm = 0
    for i in range(1, len(cc_idx)):
        if (cc_idx[i - n_rm] - cc_idx[i - n_rm - 1]) < search_win:
            if cc_t[cc_idx[i - n_rm]] > cc_t[cc_idx[i - n_rm - 1]]:
                # keep (i-n_rm)-th detection
                cc_idx.remove(cc_idx[i - n_rm - 1])
            else:
                # keep (i-n_rm-1)-th detection
                cc_idx.remove(cc_idx[i - n_rm])
            n_rm += 1
    cc_idx = np.asarray(cc_idx)
    return cc_idx
    

In [30]:

Copied!





# INTEREVENT_TIME_RESOLUTION_SEC: In some cases, a template might trigger multiple, closely spaced detections because
#                                 of a phenomenon similar to that of "cycle skipping", where the waveform correlates
#                                 well with a time-shifted version of itself. Thus, to avoid redundant detections, we
#                                 set a minimum time separation between triggers (rule of thumb: about half the template duration)
INTEREVENT_TIME_RESOLUTION_SEC = 5.
INTEREVENT_TIME_RESOLUTION_SAMP = int(INTEREVENT_TIME_RESOLUTION_SEC * SAMPLING_RATE_HZ)
_cc = cc[0, :]
time_cc = np.arange(len(_cc)) * FMF_STEP_SAMP / SAMPLING_RATE_HZ
NUM_RMS = 8.
detection_threshold = NUM_RMS * np.std(_cc)
event_cc_indexes = select_cc_indexes(_cc, detection_threshold, INTEREVENT_TIME_RESOLUTION_SAMP)

fig = plt.figure("network_averaged_cc", figsize=(20, 6))
gs = fig.add_gridspec(ncols=4)

ax1 = fig.add_subplot(gs[:3])
ax1.plot(time_cc, _cc, lw=0.75)
ax1.scatter(time_cc[event_cc_indexes], _cc[event_cc_indexes], linewidths=0.25, edgecolor="k", color="r", zorder=2)
ax1.set_xlabel("Elapsed time (sec)")
ax1.set_ylabel("Network-averaged cc")
ax1.set_xlim(time_cc.min(), time_cc.max())
ax1.set_title("Time series of template/continuous seismograms similarity")

ax2 = fig.add_subplot(gs[3], sharey=ax1)
_ = ax2.hist(_cc, orientation="horizontal", bins=250, density=True, zorder=2)
ax2.set_xlabel("Normalized count")
ax2.set_title("Histogram")

label = f"Detection threshold: {NUM_RMS:.0f}"r"$\times \mathrm{RMS}(\mathrm{CC}(t))$"f"\n{len(event_cc_indexes):d} detected events"
for ax in [ax1, ax2]:
    ax.grid()
    ax.axhline(
        detection_threshold, ls="--", color="r",
        label=label
        )
ax1.legend(loc="upper left")
# INTEREVENT_TIME_RESOLUTION_SEC: In some cases, a template might trigger multiple, closely spaced detections because
#                                 of a phenomenon similar to that of "cycle skipping", where the waveform correlates
#                                 well with a time-shifted version of itself. Thus, to avoid redundant detections, we
#                                 set a minimum time separation between triggers (rule of thumb: about half the template duration)
INTEREVENT_TIME_RESOLUTION_SEC = 5.
INTEREVENT_TIME_RESOLUTION_SAMP = int(INTEREVENT_TIME_RESOLUTION_SEC * SAMPLING_RATE_HZ)
_cc = cc[0, :]
time_cc = np.arange(len(_cc)) * FMF_STEP_SAMP / SAMPLING_RATE_HZ
NUM_RMS = 8.
detection_threshold = NUM_RMS * np.std(_cc)
event_cc_indexes = select_cc_indexes(_cc, detection_threshold, INTEREVENT_TIME_RESOLUTION_SAMP)

fig = plt.figure("network_averaged_cc", figsize=(20, 6))
gs = fig.add_gridspec(ncols=4)

ax1 = fig.add_subplot(gs[:3])
ax1.plot(time_cc, _cc, lw=0.75)
ax1.scatter(time_cc[event_cc_indexes], _cc[event_cc_indexes], linewidths=0.25, edgecolor="k", color="r", zorder=2)
ax1.set_xlabel("Elapsed time (sec)")
ax1.set_ylabel("Network-averaged cc")
ax1.set_xlim(time_cc.min(), time_cc.max())
ax1.set_title("Time series of template/continuous seismograms similarity")

ax2 = fig.add_subplot(gs[3], sharey=ax1)
_ = ax2.hist(_cc, orientation="horizontal", bins=250, density=True, zorder=2)
ax2.set_xlabel("Normalized count")
ax2.set_title("Histogram")

label = f"Detection threshold: {NUM_RMS:.0f}"r"$\times \mathrm{RMS}(\mathrm{CC}(t))$"f"\n{len(event_cc_indexes):d} detected events"
for ax in [ax1, ax2]:
    ax.grid()
    ax.axhline(
        detection_threshold, ls="--", color="r",
        label=label
        )
ax1.legend(loc="upper left")

Out[30]:

<matplotlib.legend.Legend at 0x7fae4e49bb20>

The time variation of the standard deviations of $CC(t)$ caused by gaps in some of the stations may lower the detection threshold and may thus trigger many false detections. In general, it's better to use a time-dependent detection threshold to adapt to possible gaps in the data. When using template matching on smaller seismic networks than in this example, a gap in a single station may strongly affect your time series $CC(t)$!

Interested in a bullet-proof time-dependent threshold? Check out the link below: https://ebeauce.github.io/Seismic_BPMF/usage/api/similarity_search.html#BPMF.similarity_search.time_dependent_threshold

Demonstration of a common issue encountered in template matching¶

Data requirements¶

Installing fast_matched_filter¶

Load PhaseNet catalog¶

Pick one event¶

Run template matching¶

Read data from same day¶

Introduce gaps in some stations¶

Build template¶

------------------ Background ------------------¶

------------------ On the necessity to clip template waveforms out of numpy.ndarray instead of obspy.Stream ------------------¶

Clip out waveforms and moveout and station-weight arrays¶

Run FMF¶

Set detection threshold and find events¶

Installing `fast_matched_filter`¶

------------------ On the necessity to clip template waveforms out of `numpy.ndarray` instead of `obspy.Stream` ------------------¶