1. Preface¶
1.1. About this tutorial¶
This document is an enhanced extension of my Data Mining Methds & Application (STAT 577) course in University of Tennessee at Knoxville. You may download and distribute it. Please be aware, however, that the note contains typos as well as inaccurate or incorrect description. Please give the original author corresponding credit by using thank you email or citations. If you find your work wasn’t cited in this note, please feel free to let me know.
Although I am by no means an data mining programming expert, I decided that it would be useful for me to share what I learned about data mining programming in the form of easy tutorials with detailed example. I hope those tutorials will be a valuable tool for your studies.
The tutorials assume that the reader has a preliminary knowledge of programing and unix. And this document is generated automatically by using sphinx.
1.1.1. About the authors¶
Wenqiang Feng
- Sr. Data Scientist and PhD in Mathematics
- University of Tennessee at Knoxville
- Email: von198@gmail.com
Ming Chen
- Data Scientist and PhD in Genome Science and Technology
- University of Tennessee at Knoxville
- Email: ming.chen0919@gmail.com
Weiyu Wang
- MBA and Master in Information Science
- Missouri University of Science and Technology
- Email: wwpmc@mst.com
Biography
Wenqiang Feng is Data Scientist within DST’s Applied Analytics Group. Dr. Feng’s responsibilities include providing DST clients with access to cutting-edge skills and technologies, including Big Data analytic solutions, advanced analytic and data enhancement techniques and modeling.
Dr. Feng has deep analytic expertise in data mining, analytic systems, machine learning algorithms, business intelligence, and applying Big Data tools to strategically solve industry problems in a cross-functional business. Before joining DST, Dr. Feng was an IMA Data Science Fellow at The Institute for Mathematics and its Applications (IMA) at the University of Minnesota. While there, he helped startup companies make marketing decisions based on deep predictive analytics.
Dr. Feng graduated from University of Tennessee, Knoxville, with Ph.D. in Computational Mathematics and Master’s degree in Statistics. He also holds Master’s degree in Computational Mathematics from Missouri University of Science and Technology (MST) and Master’s degree in Applied Mathematics from the University of Science and Technology of China (USTC).
Declaration
The work of Wenqiang Feng was supported by the IMA, while working at IMA. However, any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the IMA, UTK and DST.
1.2. Motivation for this tutorial¶
Data mining is a relatively new, while the technology is not. Here are the sevaral main motivation for this tutorial:
- It is no exaggeration to say that data mining has thunderstorms impacted on our real lives. I have great interest in data mining and am eager to learn those technologies.
- Fortunely, I had a chance to register Dr. Haileab Hilafu’s Data Mining Methds & Application class. Dr.Haileab Hilafu and his class inspired me to do a better job.
- However, I still found that learning data mining programing was a difficult process. I have to Google it and identify which one is true. It was hard to find detailed examples which I can easily learned the full process in one file.
- Good sources are expensive for a graduate student.
1.3. Copyright notice and license info¶
This Data Mining With Python and R PDF file is supposed to be a free and living document, which is why its source is available online at Data Mining With Python and R at Github. But this document is licensed according to both MIT License and Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) License.
When you plan to use, copy, modify, merge, publish, distribute or sublicense, Please see the terms of those licenses for more details and give the corresponding credits to the author.
1.4. Acknowledgement¶
At here, I would like to thank Dr. Haileab Hilafu for providing some of his R code and homework solutions. I also would like to thank Bo Gao, Le Yin, Chen Wen, Jian Sun and Huan Chen for the valuable disscussion and thank the generous anonymous authors for providing the detailed solutions and source code on the Internet. Without those help, those tutorials would not have been possible to be made. In those tutorials, I try to use the detailed demo code to show how to use each functions in R and Python to do data mining.
1.5. Feedback and suggestions¶
Your comments and suggestions are highly appreciated. I am more than happy to receive corrections, suggestions or feedbacks through email (Wenqiang Feng: von198@gmail.com) for improvements.