Welcome to my Learning Apache Spark with Python note! In this note, you will learn a wide array of concepts about PySpark in Data Mining, Text Mining, Machine Learning and Deep Learning. The PDF version can be downloaded from HERE.
ContentsΒΆ
- 1. Preface
- 2. Why Spark with Python ?
- 3. Configure Running Platform
- 4. An Introduction to Apache Spark
- 5. Programming with RDDs
- 6. Statistics and Linear Algebra Preliminaries
- 7. Data Exploration
- 8. Data Manipulation: Features
- 9. Regression
- 10. Regularization
- 11. Classification
- 12. Clustering
- 13. RFM Analysis
- 14. Text Mining
- 15. Social Network Analysis
- 16. ALS: Stock Portfolio Recommendations
- 17. Monte Carlo Simulation
- 18. Markov Chain Monte Carlo
- 19. Neural Network
- 20. Automation for Cloudera Distribution Hadoop
- 21. Wrap PySpark Package
- 22. PySpark Data Audit Library
- 23. Zeppelin to jupyter notebook
- 24. My Cheat Sheet
- 25. JDBC Connection
- 26. Databricks Tips
- 27. PySpark API
- 28. Main Reference