As a critical soil and water conservation project in the Loess Plateau region, check dams play a central role in controlling soil erosion and ensuring regional food security. However, their intelligent management has long been constrained by technical bottlenecks such as inefficient data acquisition, insufficient model generalization capabilities, and lack of standardized datasets. This study utilized 0.75-meter high-resolution Jilin-1 remote sensing imagery to construct an AI-ready standardized semantic segmentation dataset targeting the Jiuyuangou watershed, a typical watershed in the Loess Plateau.
| collect time | 2021/11/11 - 2021/11/11 |
|---|---|
| collect place | jiuyuangou watershed |
| data size | 522.8 MiB |
| data format | |
| Coordinate system | WGS84 |
This study utilizes 0.75-meter high-resolution imagery acquired by the PMS-02 sensor onboard the Jilin-1 Wideband 01A satellite as the core data source (Original data access URL: https://www.jl1mall.com/store/). Following multi-temporal retrieval and quality assessment, a cloud-free image (cloud cover ≤0%) captured on November 11, 2021 was selected. This dataset employs the WGS 1984 UTM Zone 6N projection, covering an area of 548.30 km², with a solar elevation angle of 29.7493° and a sensor look angle of -2°.
The complete dataset preprocessing process is divided into six stages as follows:
(1) Grid division stage: The original image of the sample area is regularly cropped into 256×256 pixel grid units, generating 3689 initial samples in total;
(2) Sample screening stage: Invalid samples without dam land are eliminated via visual interpretation, retaining 584 effective pixel grid units;
(3) Semantic annotation stage: Vector annotation for all samples is completed using the open-source platform Labelme, constructing a semantic segmentation dataset with "gully" as the unified feature category;
(4) Label conversion stage: The JSON vector annotation files are batch-rasterized into binary images using the GDAL library to ensure strict spatial coordinate matching with the original images;
(5) Data augmentation stage: Geometric-radiometric transformations (e.g., mirror flipping, rotation, and brightness adjustment) are applied to the original samples, expanding the sample size to 2920;
(6) Dataset division stage: The augmented dataset is divided into three independent subsets (training set, validation set, test set) at a ratio of 6:2:2. This standardized process balances the integrity of sample spatial representation and algorithm generalization needs, and the output multi-scale augmented dataset supports feature learning for mainstream convolutional neural networks.
The research team corrected spatial deviations in remote sensing interpretation data through field surveys (utilizing drone imagery and on-site positioning), ensuring the accuracy of check dam semantic segmentation data; subsequently, systematic controlled-variable experiments were designed, and multi-model architecture testing (e.g., mIoU exceeding 80% and OA surpassing 89%) confirmed the dataset's clear category delineation, high annotation quality, and balanced sample distribution; comparisons with public datasets demonstrated significant improvements in spatial precision and reliability for this study's dataset (e.g., extraction results showed higher alignment with actual features). This dataset exhibits high precision, strong consistency, and extensive compatibility, providing a solid and reliable data foundation for subsequent research and applications.
| # | number | name | type |
| 1 | 2022YFF0711704 | the National Key R&D Program of China | National key R & D plan |
| 2 | E01Z7902 | other |
This work is licensed under a
Creative
Commons Attribution 4.0 International License.
| # | title | file size |
|---|---|---|
| 1 | _ncdc_meta_.json | 5.9 KiB |
| 2 | 面向AI-Ready的黄土高原坝地标准化语义分割数据集.zip | 522.8 MiB |
Semantic segmentation Loess Plateau dam land soil and water conservation
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

