What is Data Mining?

By James Gordon, MS
Faculty, College of Science, Engineering and Technology

Data mining graphic

Back in 2003 I took a course in data mining as a graduate student at the University of Washington. At the time, data mining seemed like a nice marketing term – just another hot technology buzz phrase to describe the same old database design and searching techniques. Fast forward to the year 2018 and data mining has evolved to the point where it could actually be offered as a degree program. During the fall 2017 semester I taught a course at Grand Canyon University in our growing computer science program called Search Engines and Data Mining. Over the last 20 years the internet has evolved into an economy of information where data mining is a core skill that our computer science students are expected to enter the workforce with.

Data Mining Today

Not only are online purchases and large retailer websites keeping track of our shopping habits, but now even the smallest transactions and in-store purchases are also being saved in data warehouses to be retrieved at a later time. Transactions such as purchasing gas for your car, buying a loaf of bread or swiping your gym membership card can all be compiled into a profile. Companies can use this information to learn more about their customers and develop specific marketing techniques to reach them and potentially improve profits.

How Does Data Mining Work?

Data mining and analysis can really be broken down into three different stages. Suppose we have all of the Amazon gross net sales data for the state of Arizona during the year of 2017: the customer’s name, address, credit card to be billed and items that they have purchased for that 12-month period. The first step to is to “clean” the data, which is to remove any bad or invalid data, like if a customer has returned their items, used an invalid address or has an expired credit card on file. This data cleaning, or “data wrangling” as it is called, can be quite intensive and time consuming. The majority of the data analyst’s time will be spent in this “data wrangling” phase before any math or algorithms can be applied for the data mining phase.

With the data now ready for analysis, the second phase is to apply the algorithms and statistical principals that will return some useful trends to Amazon about its customers. We all know about Black Friday and Cyber Monday, but there might be a trend in the time of day that certain items are purchased or certain area codes that purchase items more frequently than others. Finding these trends takes a bit of creative thinking on the part of the data analyst, but could reap big rewards for a site like Amazon if they can recognize trends in sales and use them to attract more customers.

The third part of data mining and analysis is optimizing the algorithms used on the data set. The math and statistical principals have been around for 50 or 60 years, but being able to tweak or optimize an algorithm can really save time and yield better results from the data. In my next blog I will explain some of these algorithms in depth, as well as some optimization techniques and how neural networks fit into the data mining picture.

Learn more about GCU’s STEM degrees by visiting our website or clicking the Request More Information button on this page.

The views and opinions expressed in this article are those of the author’s and do not necessarily reflect the official policy or position of Grand Canyon University. Any sources cited were accurate as of the publish date.