This data set is based on the MCD19A2, MERRA-2, ERA5_land data sets. Through data pre-processing, data sampling, space-time matching, the aerosol optical thickness (AOD), wavelength index (AE), meteorological parameters and other data are extracted. The AERONET ground station observation values are taken as the target truth values, and the machine learning algorithm LightGBM training model is used to retrieve the daily 1-km resolution land dust aerosol optical thickness in the "the Belt and Road" desert area from 2000 to 2022. This dust aerosol optical depth dataset provides basic data for dust research in the "the Belt and Road" area.
1. Dataset Name: Dust-Aerosol-Optica-Depth
2. Attribute information
Lon: Longitude of the dataset (°)
Lat: Latitude of the dataset (°)
Dust-AOD: Optical thickness of dust aerosols
| collect time | 2000/01/01 - 2022/12/31 |
|---|---|
| collect place | The Belt and Road desert region in Asia |
| data size | 2.8 TiB |
| data format | tif |
| Coordinate system |
1. MCD19A2 data
From MODIS official website (https://modis.gsfc.nasa.gov//) to download.
2. MERRA-2 tavg1_2d_aer_Nx data
From NASA's Goddard earth science data and information service center (https://disc.gsfc.nasa.gov/) to download. (This data needs to be downloaded worldwide)
3. ERA5-Land data
From the European centre for medium-range weather forecasts (https://cds.climate.copernicus.eu/) to download. (This data needs to be downloaded for the range 0° E - 150° E, 81° N - 11° S)
1. Remote sensing data preprocessing:MERRA-2(0.5°×0.625°) and ERA5-Land(0.1°×0.1) were sampled to a resolution of 1 km by bilinear interpolation, respectively. ERA5-Land also needed to fill in the missing values by nearest neighbor interpolation before this, and then extracted the required data by spatio-temporal matching according to the nearest pixel. The aerosol optical thickness (AOD) with wavelength index (AE) <1 was selected from the AERONET ground station observation data in the study area as the target truth value of dust AOD.
2. Inversion method:The two models were trained respectively using the LightGBM model: (1) The complete model, that is, the model with no missing predictors, was used as the input training model (MCD19A2 data had a large number of missing data, for the non-missing part); (2) Incomplete model, that is, MCD19A2 has missing data, and only the predictors extracted from MERRA-2 and ERA5-Land data are used as input training models. The output of the two models needs to be averaged with the sand AOD data of MERRA-2 after downscaling before it can be used as the final inversion result.
Reference:Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[J]. Advances in neural information processing systems, 2017, 30.
The matched data of AERONET sites that were not used for training were taken for verification. The mean absolute error (MAE) was 0.0515. The root mean square error (RMSE) was 0.0805. The mean deviation (MBE) was -0.0252. The correlation coefficient (R) was 0.8559. The expected error (EE) was 81.74%.
| # | number | name | type |
| 1 | 2022YFF0711702-04 | National key R & D plan |
This work is licensed under a
Creative
Commons Attribution 4.0 International License.
| # | title | file size |
|---|---|---|
| 1 | _ncdc_meta_.json | 5.9 KiB |
| 2 | 2000 | |
| 3 | 2001 | |
| 4 | 2002 | |
| 5 | 2003 | |
| 6 | 2004 | |
| 7 | 2005 | |
| 8 | 2006 | |
| 9 | 2007 | |
| 10 | 2008 |
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

