我們產(chǎn)生和收集數(shù)據(jù)的能力正在快速增長。除了大多數(shù)商業(yè)、科學(xué)和政府事務(wù)的日益計算機化會產(chǎn)生數(shù)據(jù)之外,數(shù)碼相機、工具和條碼的廣泛應(yīng)用也會產(chǎn)生數(shù)據(jù)。在數(shù)據(jù)收集方面,掃描的文體和圖像平臺、衛(wèi)星遙感系統(tǒng)和國際互聯(lián)網(wǎng)已經(jīng)使我們的生活被巨大的數(shù)據(jù)量所包圍。這種爆炸性的數(shù)據(jù)增長促使我們比以往更迫切地需要新技術(shù)和自動化工具來幫助我們將這些數(shù)據(jù)轉(zhuǎn)換為有用的信息和知識。
本書第1版曾被KDnuggets的讀者評選為受歡迎的數(shù)據(jù)挖掘?qū)V?,是一本可讀性極佳的教材。它從數(shù)據(jù)庫角度系統(tǒng)地介紹了數(shù)據(jù)挖掘的基本概念、基本方法和基本技術(shù)以及數(shù)據(jù)挖掘的技術(shù)研究進展,重點關(guān)注其可行性、有用性、有效性和可伸縮性問題。但是,自第1版出版之后,數(shù)據(jù)挖掘領(lǐng)域的研究又取得了很大的進展,開發(fā)出了新的數(shù)據(jù)挖掘方法、系統(tǒng)和應(yīng)用。第2版在這一方面進行了加強,增加了多個章節(jié)講述的數(shù)據(jù)挖掘方法,以便能夠挖掘出復(fù)雜類型的數(shù)據(jù),包括流數(shù)據(jù)、序列數(shù)據(jù)、圖結(jié)構(gòu)數(shù)據(jù)、社群網(wǎng)絡(luò)數(shù)據(jù)和多重關(guān)系數(shù)據(jù)。
本書適合作為高等院校計算及相關(guān)專業(yè)高年級本科生的選修課教材,特別適合作為研究生的專業(yè)課教材,同時也可供從事數(shù)數(shù)據(jù)挖掘研究和應(yīng)用開發(fā)工作的相關(guān)人員作為必備的參考書。
本書主要特點是:實用地論述了從實際業(yè)務(wù)數(shù)據(jù)中抽取出的讀者需要知道的概念和技術(shù)。更新并結(jié)合了來自讀者的反饋、數(shù)據(jù)挖掘領(lǐng)域的技術(shù)變化以及統(tǒng)計和機器學(xué)習(xí)方面的更多資料。包含了許多算法和實際示例,全部以易于理解的偽代碼編寫,適用于實際的大規(guī)模數(shù)據(jù)挖掘項目。
韓家煒,伊利諾伊大學(xué)厄巴納一尚佩恩分校計算機科學(xué)系教授。由于在數(shù)據(jù)挖掘和數(shù)據(jù)庫系統(tǒng)領(lǐng)域卓有成效的研究工作,他曾多次獲得各種榮譽和獎勵,其中包括2004年ACM SIGKDD頒發(fā)的創(chuàng)新獎。同時,他還是《ACM Trarlsactiorls on Krlowledge Discovery fronl Data》雜志的主編,以
Foreword vii
Preface ix
Chapter1 Introduction
1.1 What Motivated Data Mining? Why Is It Important?
1.2 So, What Is Data Mining?
1.3 Data Mining-On What Kind of Data?
1.3.1 Relational Databases
1.3.2 Data Warehouses
1.3.3 TransactionalDatabases
1.3.4 Advanced Data and Information Systems and Advanced Applications
1.4 Data Mining Functionalities---What Kinds of Patterns Can Be Mined?
1.4.1 Concept/Class Description: Characterization and Discrimination
1.4.2 Mining Frequent Patterns, Associations, and Correlations
1.4.3 Classification and Prediction 24 1.4.4 Cluster Analysis
1.4.5 Outlier Analysis 26 1.4.6 Evolution Analysis
1.5 Are All of the Patterns Interesting?
1.6 Classification of Data Mining Systems
1.7 Data Mining Task Primitives
1.8 Integration of a Data Mining System with a Database or Data Warehouse System
1.9 Major Issues in Data Mining
1.10 Summary
Exercises
Bibliographic Notes
Chapter2 Data Preprocessing
2.1 Why Preprocess the Data?
2.2 Descriptive Data Summarization
2.2.1 Measuring the Central Tendency
2.2.2 Measuring the Dispersion of Data
2.2.3 Graphic Displays of Basic Descriptive Data Summaries
2.3 Data Cleaning
2.3.1 Missing Values
2.3.2 Noisy Data
2.3.3 Data Cleaning as a Process
2.4 Data Integration and Transformation
2.4.1 Data Integration
2.4.2 Data Transformation
2.5 Data Reduction
2.5.1 Data Cube Aggregation
2.5.2 Attribute Subset Selection
2.5.3 DimensionalityReduction
2.5.4 Numerosity Reduction
2.6 Data Discretization and Concept Hierarchy Generation
2.6.1 Discretization and Concept Hierarchy Generation for Numerical Data
2.6.2 Concept Hierarchy Generation for Categorical Data
2.7 Summary 97 Exercises 97 Bibliographic Notes
Chapter3 Data Warehouse and OLAP Technology: An Overview
3.1 What Is a Data Warehouse?
3.1.1 Differences between Operational Database Systems and Data Warehouses
3.1.2 But, Why Have a Separate Data Warehouse?
3.2 A Multidimensional Data Model
3.2.1 From Tables and Spreadsheets to Data Cubes
3.2.2 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Databases
3.2.3 Examples for Defining Star, Snowflake, and Fact Constellation Schemas
……
Chapter4 Data Cube Computation and Data Generalization
Chapter5 Mining Frequent Patterns, Associations, and Correlations
Chapter6 Classification adn Predidction
Chapter7 Cluster Analysis
Chapter8 Mining Stream, Time-Series, and Sepuence Data
Chapter9 Graph Mining, Social Network Analysis, and Multirelational
Chapter10 Mining Object, Spatial, Multimedia, Test, and Wed Data
Chapter11 Applications and Trends in Data Mining
An Introduction to Microsoft's OLE DB for
Bibliography
Index