Data mining may be a field of research that has emerged within the 1990s and is extremely popular today, sometimes under different names like “big data” and “data science” and Artificial Intelligence, which have an identical meaning to offer a brief definition of knowledge mining, it is often defined as a group of techniques for automatically analyzing data to get interesting knowledge or pasterns within the data.
The reason why data processing has become popular is that storing data electronically has become rock bottom which transferring data can now be done very quickly because of the fast computer networks that we've today. Many organizations now have huge amounts of knowledge stored in databases that must be analyzed.
Traditionally, data has been analyzed by hand to get interesting knowledge. However, this is often time-consuming, susceptible to error, doing this might miss some important information, and it's just not realistic to try to do this on large databases to deal with this problem, automatic techniques are designed to research data and extract interesting patterns, trends, this is often the aim of knowledge mining.
In general, data processing techniques are designed either to elucidate or understand the past or predict the longer-term predict if there'll be an earthquake tomorrow at a given location.
How to process data: To perform data processing a process consisting of seven steps are typically followed. This process is usually called the “Knowledge Discovery in Database” (KDD) process.
• Data cleaning: This step consists of cleaning the info by removing noise or other inconsistencies that would be a drag for analyzing the info.
• Data integration: This step consists of integrating data from various sources to organize the info that must be analyzed. for instance, if the info is stored in multiple databases or files, it's going to be necessary to integrate the info into one file or database to research it.
• Data selection: This step consists of choosing the relevant data for the analysis to be performed.
• Data transformation: This step consists of remodeling the info to a correct format which will be analyzed using data processing techniques. Some data processing techniques require that each numerical value is normalized.
• Data mining: This step consists of applying some data processing techniques (algorithms) to research the info and find out interesting patterns or extract interesting knowledge from this data.
• Evaluating the knowledge that has been discovered: This step consists of evaluating the knowledge that has been extracted from the info this will be wiped out in terms of objective and/or subjective measures. types of data The main sorts of patterns that will be extracted from data are subsequent of course, this is often not an exhaustive list:
• Clusters: Clustering algorithms are often applied to automatically group similar instances or objects in clusters (groups) clustering techniques like K-Means are often wont to automatically groups customers having an identical behavior. • Classification models: Classification algorithms aim at extracting models which will be wont to classify new instances classification algorithms like Naive Bayes, neural networks and decision trees are often wont to build models which will predict if a customer can pay back his debt or not, or predict.
• Patterns and associations: Several techniques are developed to extract frequent patterns or associations between values in the database.
• Outliers: The goal is to detect things that are abnormal in data. Some applications are detecting hackers attacking a computing system identifying potential terrorists supported suspicious behavior.
• Techniques of data: can even be applied to seek out trends and regularities in data. Some applications are for studying patterns within the stock market to predict stock prices.
Conclusion In this blog, I even have given an overview of what's data mining. This blog was quite general. I even have actually written it because I'm teaching a course on data mining and this may be a number of the content of the primary lecture.