1. Preface

1.1. About this tutorial

This document is an enhanced extension of my Data Mining Methds & Application (STAT 577) course in University of Tennessee at Knoxville. You may download and distribute it. Please be aware, however, that the note contains typos as well as inaccurate or incorrect description. Please give the original author corresponding credit by using thank you email or citations. If you find your work wasn’t cited in this note, please feel free to let me know.

Although I am by no means an data mining programming expert, I decided that it would be useful for me to share what I learned about data mining programming in the form of easy tutorials with detailed example. I hope those tutorials will be a valuable tool for your studies.

The tutorials assume that the reader has a preliminary knowledge of programing and unix. And this document is generated automatically by using sphinx.

1.1.1. About the authors

  • Wenqiang Feng

    • Sr. Data Scientist and PhD in Mathematics
    • University of Tennessee at Knoxville
    • Email: von198@gmail.com
  • Ming Chen

    • Data Scientist and PhD in Genome Science and Technology
    • University of Tennessee at Knoxville
    • Email: ming.chen0919@gmail.com
  • Weiyu Wang

    • MBA and Master in Information Science
    • Missouri University of Science and Technology
    • Email: wwpmc@mst.com
  • Biography

    Wenqiang Feng is Data Scientist within DST’s Applied Analytics Group. Dr. Feng’s responsibilities include providing DST clients with access to cutting-edge skills and technologies, including Big Data analytic solutions, advanced analytic and data enhancement techniques and modeling.

    Dr. Feng has deep analytic expertise in data mining, analytic systems, machine learning algorithms, business intelligence, and applying Big Data tools to strategically solve industry problems in a cross-functional business. Before joining DST, Dr. Feng was an IMA Data Science Fellow at The Institute for Mathematics and its Applications (IMA) at the University of Minnesota. While there, he helped startup companies make marketing decisions based on deep predictive analytics.

    Dr. Feng graduated from University of Tennessee, Knoxville, with Ph.D. in Computational Mathematics and Master’s degree in Statistics. He also holds Master’s degree in Computational Mathematics from Missouri University of Science and Technology (MST) and Master’s degree in Applied Mathematics from the University of Science and Technology of China (USTC).

  • Declaration

    The work of Wenqiang Feng was supported by the IMA, while working at IMA. However, any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the IMA, UTK and DST.

1.2. Motivation for this tutorial

Data mining is a relatively new, while the technology is not. Here are the sevaral main motivation for this tutorial:

  1. It is no exaggeration to say that data mining has thunderstorms impacted on our real lives. I have great interest in data mining and am eager to learn those technologies.
  2. Fortunely, I had a chance to register Dr. Haileab Hilafu’s Data Mining Methds & Application class. Dr.Haileab Hilafu and his class inspired me to do a better job.
  3. However, I still found that learning data mining programing was a difficult process. I have to Google it and identify which one is true. It was hard to find detailed examples which I can easily learned the full process in one file.
  4. Good sources are expensive for a graduate student.

1.4. Acknowledgement

At here, I would like to thank Dr. Haileab Hilafu for providing some of his R code and homework solutions. I also would like to thank Bo Gao, Le Yin, Chen Wen, Jian Sun and Huan Chen for the valuable disscussion and thank the generous anonymous authors for providing the detailed solutions and source code on the Internet. Without those help, those tutorials would not have been possible to be made. In those tutorials, I try to use the detailed demo code to show how to use each functions in R and Python to do data mining.

1.5. Feedback and suggestions

Your comments and suggestions are highly appreciated. I am more than happy to receive corrections, suggestions or feedbacks through email (Wenqiang Feng: von198@gmail.com) for improvements.