石油化工设备技术 ›› 2024, Vol. 45 ›› Issue (6): 48-53.doi: 10.3969/j.issn.1006-8805.2024.06.010

• 设备管理 • 上一篇    

老旧装置静设备设计文件合规性排查数据分析模型研究

苑丹丹,陈嘉怡,苑旭阳,刘 洋,张芷筠   

  1. 中国石化工程建设有限公司,北京 100101
  • 收稿日期:2024-08-23 接受日期:2024-10-31 出版日期:2024-11-15 发布日期:2024-11-15
  • 作者简介:苑丹丹,女,2019年毕业于天津工业大学电子与通信工程专业,硕士,主要从事大数据和智能算法科研工作,工程师。

Research on Data Analysis Model for Design Documents Compliance Checking of Old Static Equipment

Yuan Dandan, Chen Jiayi, Yuan Xuyang, Liu Yang, Zhang Zhiyun   

  1. SINOPEC Engineering Incorporation, Beijing, 100101
  • Received:2024-08-23 Accepted:2024-10-31 Online:2024-11-15 Published:2024-11-15

摘要: 开展老旧装置设计文件合规性排查有利于精准管控炼化行业发展进程中积累的风险,是统筹好发展和安全的必然要求。文章对获取的老旧装置静设备设计文件排查数据中的文本数据进行机器学习算法研究,构建数据分析模型,旨在获取不符合设计文件合规性的老旧装置的关键信息,并进行分析研究。研究主要包括3方面:一是利用数据预处理方法对文本进行数据清洗;二是利用TextRank算法实现大量文本中关键词和关键短语的自动获取;三是利用LDA算法训练主题模型,实现文本中关键主题的自动生成。利用该方法得到的结果可辅助专家对老旧装置设备的安全风险信息进行快速、精准的排查和评估。

关键词: 合规性排查, 机器学习, 数据清洗, TextRank算法, LDA主题模型

Abstract: Carrying out compliance checking of old equipment design documents is conducive to accurately controlling the risks accumulated in the development process of the refining and chemical industry. It is an inevitable requirement for overall development and safety. In this paper, the text data in the static equipment design documents of the old device is studied by machine learning algorithm, and the data analysis model is constructed, aiming to obtain and analyze the key information of the old device that does not meet the compliance of the design documents. The research mainly consists of three aspects: the first is to use data preprocessing method for data cleaning; the second is to use TextRank algorithm to achieve automatic acquisition of the keywords and key phrases in a large number of texts; the third is to use LDA algorithm to train the topic model to automatically generate the key topics in the text. The results obtained from this method can assist experts in rapid and accurate investigation and assessment of safety risk information of old equipment.

Key words: compliance checking, machine learning, data cleaning, TextRank algorithm, LDA topic model