2. Python Environment

Note

This Chapter Python Environment is for beginner. If you have some Python programming experience, you may skip this chapter. For beginners, you can choose the hard-core way, installing and setting up python in your own computer. Alternatively, there is easy route to leverage free online data science environment that requires zero setup.

2.1. Use python on local computer

No matter what operator system is, I will strongly recommend you to install Anaconda which contains Python, Jupyter, spyder, Numpy, Scipy, Numba, pandas, DASK, Bokeh, HoloViews, Datashader, matplotlib, scikit-learn, H2O.ai, TensorFlow, CONDA and more.

Download link: https://www.anaconda.com/distribution/

_images/anaconda.png

2.2. Use python on the cloud (zero setup)

Learning a new programming language is not easy. Luckily, nowadays there are many web-based data science environment available that allows one to learn python without downloading or installing python in local laptop, you can do almost anything online, including free access to GPUs!

2.2.1. Google Colab (FREE)

Quote from the official introduction on Google, Colaboratory, or “Colab” for short, is a free Jupyter notebook environment that requires no setup, and runs entirely (writing, running, & sharing code) within Google Drive.

  • Zero configuration/setup required on your own machine!

  • Free access to GPUs and TPUs: code execute on Google’s cloud servers

  • Search and use built-in code snippets

  • Easy sharing (like Google doc)

_images/google_clob.png

2.2.2. Kaggle Kernels (FREE)

Kaggle is best known as a platform for data science competitions. They also provide a free service called Kernels that can be used independently of their competitions.

_images/kaggle_cloud.png

There are a few more choices to run the Jupyter Notebook in the cloud. Feel free to checkout this article below: https://www.dataschool.io/cloud-services-for-jupyter-notebook/ This blog is posted in March 2019, content maybe a little out of date. It does offer great in-depth comparison of the different platforms.

2.2.3. Databricks Community Edition (FREE)

The Databricks Community Edition is the free version of Databricks cloud-based big data platform. The users can access a micro-cluster as well as a cluster manager and notebook environment. All users can share their notebooks and host them free of charge with Databricks.

The Databricks Community Edition also comes with a rich portfolio of award-winning training resources that will be expanded over time, making it ideal for developers, data scientists, data engineers and other IT professionals to learn Apache Spark. More details can be found at: https://community.cloud.databricks.com

_images/community_cloud.png