Batch Prediction¶
1. Download demo data¶
cd PhaseNet
wget https://github.com/wayneweiqiang/PhaseNet/releases/download/test_data/test_data.zip
unzip test_data.zip
2. Run batch prediction¶
PhaseNet currently supports four data formats: mseed, sac, hdf5, and numpy.
- For mseed format:
python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed.csv --data_dir=test_data/mseed --format=mseed --plot_figure
- For sac format:
python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/sac.csv --data_dir=test_data/sac --format=sac --plot_figure
- For numpy format:
python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/npz.csv --data_dir=test_data/npz --format=numpy --plot_figure
- For hdf5 format:
python phasenet/predict.py --model=model/190703-214543 --hdf5_file=test_data/data.h5 --hdf5_group=data --format=hdf5 --plot_figure
- For a seismic array (used by QuakeFlow):
python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed_array.csv --data_dir=test_data/mseed_array --stations=test_data/stations.json --format=mseed_array --amplitude
python phasenet/predict.py --model=model/190703-214543 --data_list=test_data/mseed2.csv --data_dir=test_data/mseed --stations=test_data/stations.json --format=mseed_array --amplitude
Notes:
- Remove the "--plot_figure" argument for large datasets, because plotting can be very slow.
Optional arguments:
usage: predict.py [-h] [--batch_size BATCH_SIZE] [--model_dir MODEL_DIR]
[--data_dir DATA_DIR] [--data_list DATA_LIST]
[--hdf5_file HDF5_FILE] [--hdf5_group HDF5_GROUP]
[--result_dir RESULT_DIR] [--result_fname RESULT_FNAME]
[--min_p_prob MIN_P_PROB] [--min_s_prob MIN_S_PROB]
[--mpd MPD] [--amplitude] [--format FORMAT]
[--s3_url S3_URL] [--stations STATIONS] [--plot_figure]
[--save_prob]
optional arguments:
-h, --help show this help message and exit
--batch_size BATCH_SIZE
batch size
--model_dir MODEL_DIR
Checkpoint directory (default: None)
--data_dir DATA_DIR Input file directory
--data_list DATA_LIST
Input csv file
--hdf5_file HDF5_FILE
Input hdf5 file
--hdf5_group HDF5_GROUP
data group name in hdf5 file
--result_dir RESULT_DIR
Output directory
--result_fname RESULT_FNAME
Output file
--min_p_prob MIN_P_PROB
Probability threshold for P pick
--min_s_prob MIN_S_PROB
Probability threshold for S pick
--mpd MPD Minimum peak distance
--amplitude if return amplitude value
--format FORMAT input format
--stations STATIONS seismic station info
--plot_figure If plot figure for test
--save_prob If save result for test
3. Output picks¶
- The output picks are saved to "results/picks.csv" on default
file_name | begin_time | station_id | phase_index | phase_time | phase_score | phase_amp | phase_type |
---|---|---|---|---|---|---|---|
2020-10-01T00:00* | 2020-10-01T00:00:00.003 | CI.BOM..HH | 14734 | 2020-10-01T00:02:27.343 | 0.708 | 2.4998866231208325e-14 | P |
2020-10-01T00:00* | 2020-10-01T00:00:00.003 | CI.BOM..HH | 15487 | 2020-10-01T00:02:34.873 | 0.416 | 2.4998866231208325e-14 | S |
2020-10-01T00:00* | 2020-10-01T00:00:00.003 | CI.COA..HH | 319 | 2020-10-01T00:00:03.193 | 0.762 | 3.708662269972206e-14 | P |
Notes:
- The phase_index means which data point is the pick in the original sequence. So phase_time = begin_time + phase_index / sampling rate. The default sampling_rate is 100Hz
3. Read P/S picks¶
PhaseNet currently outputs two format: CSV and JSON
In [1]:
Copied!
import pandas as pd
import json
import os
PROJECT_ROOT = os.path.realpath(os.path.join(os.path.abspath(''), ".."))
import pandas as pd
import json
import os
PROJECT_ROOT = os.path.realpath(os.path.join(os.path.abspath(''), ".."))
In [2]:
Copied!
picks_csv = pd.read_csv(os.path.join(PROJECT_ROOT, "results/picks.csv"), sep="\t")
picks_csv.loc[:, 'p_idx'] = picks_csv["p_idx"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 'p_prob'] = picks_csv["p_prob"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 's_idx'] = picks_csv["s_idx"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 's_prob'] = picks_csv["s_prob"].apply(lambda x: x.strip("[]").split(","))
print(picks_csv.iloc[1])
print(picks_csv.iloc[0])
picks_csv = pd.read_csv(os.path.join(PROJECT_ROOT, "results/picks.csv"), sep="\t")
picks_csv.loc[:, 'p_idx'] = picks_csv["p_idx"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 'p_prob'] = picks_csv["p_prob"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 's_idx'] = picks_csv["s_idx"].apply(lambda x: x.strip("[]").split(","))
picks_csv.loc[:, 's_prob'] = picks_csv["s_prob"].apply(lambda x: x.strip("[]").split(","))
print(picks_csv.iloc[1])
print(picks_csv.iloc[0])
fname NC.MCV..EH.0361339.npz t0 1970-01-01T00:00:00.000 p_idx [5999, 9015] p_prob [0.987, 0.981] s_idx [6181, 9205] s_prob [0.553, 0.873] Name: 1, dtype: object fname NN.LHV..EH.0384064.npz t0 1970-01-01T00:00:00.000 p_idx [] p_prob [] s_idx [] s_prob [] Name: 0, dtype: object
In [3]:
Copied!
with open(os.path.join(PROJECT_ROOT, "results/picks.json")) as fp:
picks_json = json.load(fp)
print(picks_json[1])
print(picks_json[0])
with open(os.path.join(PROJECT_ROOT, "results/picks.json")) as fp:
picks_json = json.load(fp)
print(picks_json[1])
print(picks_json[0])
{'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:01:30.150', 'prob': 0.9811667799949646, 'type': 'p'} {'id': 'NC.MCV..EH.0361339.npz', 'timestamp': '1970-01-01T00:00:59.990', 'prob': 0.9872905611991882, 'type': 'p'}