Knowledge Resource Center for Ecological Environment in Arid Area
基于机器学习模型的青藏高原日降水数据订正研究 | |
其他题名 | Correction of Daily Precipitation Data over the Qinghai-Tibetan Plateau with Machine Learning Models |
王玉丹 | |
出版年 | 2016 |
学位类型 | 硕士 |
导师 | 王维真 ; 南卓铜 |
学位授予单位 | 中国科学院大学 |
中文摘要 | 青藏高原独特的自然条件和气候特征对周边地区的气候和水文系统具有重大影响。降水是气候和水文过程模拟研究的主要驱动因子,其误差对陆面模型和水文模型模拟结果有直接影响。在当前全球变暖的背景下,研究可用于水文模型和气候模型的高时空分辨率降水数据集,对于模拟青藏高原的气候变化和水文过程具有重要意义。青藏高原横跨数个气候带,气候特征复杂多变。但高原面上的气象站点数量较少且分布不均。遥感产品可以反映高原降水的面域特征,但在数值上存在一定的误差。因此青藏高原地区的降水产品,大多是通过气象站点观测数据插值、遥感降水资料反演与订正、数据同化或者气候模式运算等手段生成的。对于常用的CMORPH(Climate Prediction Center Morphing Technique)等遥感降水数据集,仍需要进一步的订正和误差评价。研究表明,气温、风速、湿度和气压等气象因子,以及地形和植被等环境因子,与青藏高原日尺度降水的空间分布和降水量存在着一定的相关性。目前综合气象和环境因子来订正青藏高原日降水数据的研究还不多见。在对青藏高原地区降水发生和发展机制研究还不够充分、数据比较缺乏的情况下,机器学习模型能够通过综合多种相关因素挖掘降水数据本身潜在的变化规律,从而实现对降水的发生、发展和时空分布的模拟,适用于青藏高原地区的降水订正研究。本文采用多元自适应样条(Multivariate Adaptive Regression Splines,简称MARS)、K-最近邻算法(K-Nearest Neighbor,简称KNN)、支持向量机(Support Vector Machines,简称 SVM)、多项对数线性模型(Multinomial Log-Linear Model,简称MLM)和人工神经网络(Artificial Neural Network,简称ANN)等五种机器学习模型,考虑多个环境因子(海拔、坡度、坡向、植被)和气象因子(气温、相对湿度、风速),订正基于遥感产品的CMORPH青藏高原日降水数据集,比较机器学习模型法与概率密度函数匹配法(Probability Density Function Matching Method,简称PDF法)对CMORPH数据的订正效果,并将CMORPH订正值与融合了遥感数据和观测数据的ITPCAS(Institute of Tibetan Plateau Research, Chinese Academy of Sciences)降水数据进行比较,讨论机器学习模型在青藏高原日降水数据订正研究中的适用性。具体计算过程如下:以五折交叉验证方法来计算机器学习模型模拟的均方根误差(RMSE),筛选出RMSE最低的模型来订正CMORPH日降水数据,对比112个标准气象站点处的机器学习模型订正值、PDF订正值和观测值,分析模拟误差的时空分布,并以唐古拉、西大滩和五道梁3个未参加模型构建的气象站点观测值来进一步验证模型的订正误差。通过主成分分析法(Principal Component Analysis,简称PCA)和单因子订正法,分析模型采用的七种气象和环境因子对降水订正的贡献率。以包括青藏高原寒旱核心少雨区在内的青藏高原八个降水典型区的多年平均降水特征来评估CMORPH降水数据在青藏高原的空间分布特征。通过以上研究得出的主要结论有:1. 五折交叉验证的结果显示:KNN模型对CMORPH日降水数据的订正效果最佳,其次是准确度略低的SVM模型,MLM、MARS和KNN模型的模拟结果较差。综合各种模型本身的适用特点,应选取KNN模型进行订正研究。2. 对112个标准气象站点处降水值的误差分析表明,与CMORPH日降水的PDF法订正值相比,CMORPH日降水的KNN法订正值与实测数据的相关系数较大,偏差较小;根据已知的青藏高原八个降水典型区的多年平均年降水分布特征,CMORPH KNN法订正值的年累计降水在六个典型区表现较好,PDF法订正值的年累计降水在两个典型区表现较好,CMORPH原始值的年累计降水在一个典型区表现好。KNN法订正值的降水空间分布较为合理。3. 对唐古拉、西大滩和五道梁验证气象站的比较研究表明:CMORPH日降水的KNN订正值与PDF法订正值相比,与站点实测数据在日尺度和月尺度上的相关性更好,经过KNN模型订正后的单点降水数据误差得到改善。4. CMORPH日降水的KNN法订正值的误差时间分布仍然体现出明显的季节变化趋势:表现为在夏季偏差大、RMSE大、相对偏差小;冬季偏差小、RMSE小、相对偏差大,春秋季居中。CMORPH年累积降水的KNN法订正值的误差分布呈现一定的地域特点:在横断山区和藏高原北部边缘的干旱/半干旱区域偏差较大,这与以上地区地形复杂、降水量以及其他气象因子、环境因子的空间差异性较大有关。5. 通过PCA法分析七种气象和环境因子对降水订正的贡献率,结果表明贡献率从大到小依次为海拔、相对湿度、坡向、植被、风速、气温、坡度,各个因子的贡献率相差不大,说明降水订正是气象和环境因子综合作用的结果,8km分辨率下整个青藏高原的降水特征不具有单因子依赖性。单因子订正法的结果表明七种气象和环境因子的单因子订正相关系数在0.88到0.61之间,从大到小依次为相对湿度、海拔、坡向、风速、气温、坡度和植被,与PCA法的结果相近,植被的订正效果不佳与植被数据的时间分辨率低和精度不够有关,单因子订正的误差高于综合因子订正。6. 与ITPCAS日降水值的比较结果显示:CMORPH的KNN订正值与ITPCAS日降水值相比,在唐古拉、西大滩和五道梁三个验证气象站的误差相近,CMORPH的KNN订正值误差略小。二者降水空间分布在七个典型区接近,但在青藏高原西部和北部,均存在一定的误差,CMORPH订正值的误差较大,这与该地区的遥感数据偏差较大,而观测数据较为缺乏有关。 |
英文摘要 | The unique geography and climate of the Qinghai-Tibet Plateau (QTP) have great impact on the climate and hydrology of the surrounding areas. Precipitation is important forcing data in the related model simulations. Under the background of climate change, precipitation with high temporal and spatial resolution is critical to the successful simulation of the climatic and hydrological processes of the QTP.The QTP come across several climatic zones with changing climatic characteristics. The meteorological stations are unevenly and scarcely distributed on the QTP, whereas the remote sensing data that used to detect the spatial features of precipitation usually has considerable errors. Thus precipitation products over the QTP were mainly obtained by interpolation, remote sensing data inversion, and data assimilation with remote sensing data and multi-sourced observation. The commonly used precipitation datasets such as CMORPH (Climate Prediction Center Morphing Technique) needs further correction and evaluation.Researches show that environmental factors such as topography and vegetation, as well as meteorological factors such as air temperature, wind speed, humidity and atmospheric pressure are closely related to the magnitude and spatial distribution of daily precipitation. At present, few researches has been done on the precipitation correction by the combination of environmental and meteorological factors. As the mechanisms behind the occurrence and development of precipitation are not fully explored, and the data insufficiency is sever, the machine learning models can be used to correct the precipitation over the QTP due to its capability of exploring the changing pattern of precipitation with the combination of various related factors.In this paper, five machine learning models including Multivariate Adaptive Regression Splines (MARS), K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Multinomial Log-Linear Model (MLM) and Artificial Neural Network (ANN) were used to correct the CMORPH daily precipitation dataset over the QTP. Various environmental (elevation, slope, aspect, vegetation) and meteorological (air temperature, humidity, wind speed) factors were utilized in the correction processes. The performance of the machine learning model and the PDF (Probability Density Function Matching Method) model in corrections were compared, and the accuracy of corrected CMORPH data were validated by the ITPCAS (Institute of Tibetan Plateau Research, Chinese Academy of Sciences). The calculation is as follows: the Root Mean Square Error (RMSE) of the simulations done by the machine learning models were computed by 5 fold cross validation. The model with the lowest RMSE was selected to correct the CMORPH data. The observed precipitation, corrected precipitation by machine learning model and PDF were compared at the standard meteorological stations. The temporal and spatial distribution of simulation errors over those stations were analyzed, and the performance of those datasets were further checked at the Tanggula, Xidatan and Wudaolian stations which were not involved in model construction. Principal Component Analysis (PCA) and single factor validation were used to check the contribution of the seven meteorological and environmental factors to the correction of precipitation. Eight typical precipitation areas over the QTP were used to evaluate the spatial distribution of CMORPH over the QTP. Here is the main conclusions:5 fold cross validation show that KNN model has the best performance in the CMORPH precipitation correction, SVM, MLM, MARS and KNN model has less accurate results. KNN model was selected in the further correction studies.Validation at Tanggula, Xidatan and Wudaoliang stations show that the KNN corrected CMORPH daily precipitation has better daily and monthly correlation with the observed value than the PDF corrected ones.The error analysis of precipitation at standard stations present that the KNN correction has higher correlation and lower bias than the PDF correction. For the precipitation over eight typical areas of the QTP, KNN correction shows good performance over six areas, whereas PDF correction shows good performance over two areas and the original CMORPH precipitation are good only at one typical area. The spatial distribution of the KNN correction are nearer to the reality.The temporal distribution of KNN correction errors for CMORPH daily precipitation still show evident seasonal trend: the RMSE and bias are large, and relative bias is small in summer; bias and RMSE are small, and relative bias are large in winter. The error distribution of annual CMORPH precipitation corrected by KNN shows regional features: the bias is large in the arid/ semi-arid regions of Hengduan Mountain and north of the QTP. It is related with the complicated topography, precipitation and the heterogeneity of other environmental and meteorological factors.Contribution of seven environmental and meteorological factors to precipitation corrections show that the contribution from large to small are elevation, humidity, aspect, vegetation, wind speed, air temperature and slope. The magnitude of contribution of the seven factors are near, which shows that the precipitation correction is the comprehensive outcome of all the factors. The single factor correction show that the correlation coefficients of the seven factors are from 0.61 to 0.88. The sequence from large to small is listed by humidity, elevation, aspect, wind speed, air temperature, slope and vegetation, which is near to the PCA results. The poor correction performance of vegetation is related with the coarse temporal resolution of the vegetation data. The error of single factor correction is larger than the comprehensive factors correction.The error of the KNN correction of CMORPH daily precipitation is near to the ITPCAS precipitation at Tanggula, Xidatan and Wudaoliang station. The KNN correction has lower errors. The spatial distribution of precipitation of the two datasets is very near in seven typical precipitation areas of the QTP, whereas at the north and west of the QTP, they all have larger errors. The KNN correction of CMORPH has larger errors. This is related with the big bias of remote sensing data and less observation at those areas. |
中文关键词 | 青藏高原 ; 机器学习模型 ; CMORPH降水数据 ; ITPCAS气象要素数据 ; 降水订正 |
英文关键词 | The Qinghai-Tibet Plateau Machine learning models CMORPH precipitation ITCAS precipitation precipitation correction |
语种 | 中文 |
国家 | 中国 |
来源学科分类 | 地图学与地理信息系统 |
来源机构 | 中国科学院西北生态环境资源研究院 |
资源类型 | 学位论文 |
条目标识符 | http://119.78.100.177/qdio/handle/2XILL650/287725 |
推荐引用方式 GB/T 7714 | 王玉丹. 基于机器学习模型的青藏高原日降水数据订正研究[D]. 中国科学院大学,2016. |
条目包含的文件 | 条目无相关文件。 |
个性服务 |
推荐该条目 |
保存到收藏夹 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[王玉丹]的文章 |
百度学术 |
百度学术中相似的文章 |
[王玉丹]的文章 |
必应学术 |
必应学术中相似的文章 |
[王玉丹]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。