WHAT IS DATA MINING?
Data Mining as the name suggests is the process of extracting information from data. Also known as “Knowledge Discovery in Databases”, it helps to extract hidden patterns, future trends and behaviors subsequently facilitating decision making in businesses.
This extraction of data is done by using various tools and technologies like Apache Mahout, IBM Cognos, Oracle Data Mining etc. Today, the Velocity, Volume and Variety at which data is being produced over the internet is mind-boggling. As per the statistics, 90% of the data produced till date comprises of the data produced in the last two years. This is shocking but true.
HOW IS DATA MINING DONE?
It is a three-step process.
- Data Integration – Firstly proper research is done and data is integrated.
- Data Extraction – Next, Data Mining is done. That is from the data collected above, useful data is extracted to get information.
- Data Presentation – Finally, the useful data is presented in a managed and organised way to be used for analysis.
THE INFORMATION ERA
Information Era has begun. With the accessibility of the Internet and the cheaper internet tariff plans have altogether allowed businesses and people all over the world to share ideas and information. This led to an increasing amount of data than ever. No one ever thought that the rate at which money and thoughts will be travelling across the world would be similar one day. You won’t be able to absorb this fact but according to a report we are generating over 2.5 quintillion bytes of data in a single day.
WHERE IS THIS DATA COMING FROM?
The search queries over the internet, information over various social media sites, communication, digital photos and internet of things are all contributing to this huge rise in data generation.
WHAT ARE THE CHALLENGES IN DATA MINING?
- Issues related to mining methodologies: With a large number of user interaction the responsibility of mining data that covers a broad section of user queries has increased. The data cleaning done during data processing needs to be accurate.
- Issues related to Performance: Various algorithms have been developed to handle data accurately. But the performance issues of these algorithms is a huge challenge.
- Issues related to diverse data type: The variety of data that is being generated is heterogeneous and the problem to handle such different types of data remains to be a task.