2. Python or R for data analysis?

Note

Sharpening the knife longer can make it easier to hack the firewood – old Chinese proverb

There is an old Chinese proverb that Says ‘sharpening the knife longer can make it easier to hack the firewood’. In other words, take extra time to get it right in the preparation phase and then the work will be easier. So it is worth to take several minites to think about which programming language is better for you.

When you google it, you will get many useful results. Here are some valueable information from Quora:

2.1. Ponder over questions

  • Six questions to ponder over from Vipin Tyagi at Quora
    1. Is your problem is purely data analysis based or mixed one involving mathematics, machine-learning, artificial intelligence based?
    2. What are the commonly used tools in your field?
    3. What is the programming expertise of your human resources?
    4. What level of visualization you require in your presentations?
    5. Are you academic, research-oriented or commercial professional?
    6. Do you have access to number of data analytic softwares for doing your assignment?

2.2. Comparison List

  R Python
advantages
  • great for prototyping
  • great for statistical analysis
  • nice IDE
  • great for scripting and automating your different data mining pipelines
  • integrates easily in a production workflow
  • can be used across different parts of your software engineering team
  • scikit-learn library is awesome for machine-learning tasks.
  • Ipython is also a powerful tool for exploratory analysis and presentations
disadvantages
  • syntax could be obscure
  • libraries documentation isn’t always user friendly
  • harder to integrate to a production workflow.
  • It isn’t as thorough for statistical analysis as R
  • learning curve is steeper than R, since you can do much more with Python

2.3. My Opinions

In my opinion, if you want to be a decent Data Analyst or Data Scientist, you should learn both – R and Python. Since they are open-source softwares (open-source is always good in my eyes) and are free to download. If you are a beginer without any programming experience and only want to do some data analysis, I would definitely suggest to use R. Otherwise, I would suggest to use both.