2. Python Environment¶
Note
This Chapter Python Environment is for beginner. If you have some Python
programming experience, you may skip this chapter. For beginners, you
can choose the hard-core way, installing and setting up python in your
own computer. Alternatively, there is easy route to leverage free online
data science environment that requires zero setup.
2.1. Use python on local computer¶
No matter what operator system is, I will strongly recommend you to install
Anaconda
which contains Python
, Jupyter
, spyder
, Numpy
,
Scipy
, Numba
, pandas
, DASK
, Bokeh
, HoloViews
,
Datashader
, matplotlib
, scikit-learn
, H2O.ai
, TensorFlow
,
CONDA
and more.
Download link: https://www.anaconda.com/distribution/
2.2. Use python on the cloud (zero setup)¶
Learning a new programming language is not easy. Luckily, nowadays there are many web-based data science environment available that allows one to learn python without downloading or installing python in local laptop, you can do almost anything online, including free access to GPUs!
2.2.1. Google Colab (FREE)¶
Quote from the official introduction on Google, Colaboratory, or “Colab” for short, is a free Jupyter notebook environment that requires no setup, and runs entirely (writing, running, & sharing code) within Google Drive.
Zero configuration/setup required on your own machine!
Free access to GPUs and TPUs: code execute on Google’s cloud servers
Search and use built-in code snippets
Easy sharing (like Google doc)
2.2.2. Kaggle Kernels (FREE)¶
Kaggle is best known as a platform for data science competitions. They also provide a free service called Kernels that can be used independently of their competitions.
There are a few more choices to run the Jupyter Notebook in the cloud. Feel free to checkout this article below: https://www.dataschool.io/cloud-services-for-jupyter-notebook/ This blog is posted in March 2019, content maybe a little out of date. It does offer great in-depth comparison of the different platforms.
2.2.3. Databricks Community Edition (FREE)¶
The Databricks Community Edition is the free version of Databricks cloud-based big data platform. The users can access a micro-cluster as well as a cluster manager and notebook environment. All users can share their notebooks and host them free of charge with Databricks.
The Databricks Community Edition also comes with a rich portfolio of award-winning training resources that will be expanded over time, making it ideal for developers, data scientists, data engineers and other IT professionals to learn Apache Spark. More details can be found at: https://community.cloud.databricks.com