Recently Python is the most focused language in the Data Science Industry. I have collected some of the most popular library used by Python for Data Science. All the provided libraries are Open Source. I am giving you a short details about the library including GitHub link, Start rating, commits and contributors info.
1. NumPy
GitHub Link, Star: 6874, Commits: 17664, Contributors: 629
The most fundamental package, around which the scientific computation stack is built, is NumPy (stands for Numerical Python). It contains other things like a powerful N-dimensional array object, sophisticated functions, tools for integrating C/C++ and Fortran code and useful linear algebra, Fourier transform, and random number capabilities. The library provides vectorization of mathematical operations on the NumPy array type, which ameliorates performance and accordingly speeds up the execution.
2. SciPy
GitHub Source, Star: 4303, Commits:18955, Contributors: 592
SciPy library is one of the core packages that build SciPy stack. SciPy contains modules for linear algebra, optimization, integration, and statistics. The main functionality of SciPy library is built upon NumPy, and its arrays thus make substantial use of NumPy. It provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.
3. Pandas
GitHub Source, Star: 13717, Commits: 16984, Contributors: 1135
Pandas is Flexible and powerful data analysis / manipulation library for Python, its providing labeled data structures similar to R data.frame objects, statistical functions, and much more. Pandas is a package that designed to work with “labeled” and “relational” data simple and intuitive. It is a perfect tool for data wrangling.
4. Matplotlib
GitHub Source, Star: 7041, Commits: 25317, Contributors: 712
Matplotlib is a visualization library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. With some effort you can make just about any visualizations like Line plots, Bar charts and Histograms, Pie charts etc. It has the facilities for creating labels, grids, legends, and many other formatting entities. Basically, you can customize it according to your need.
5. Seaborn
GitHub Source, Star: 4739, Commits: 2034, Contributors: 83
Seaborn is a visualization library which is based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. It is mainly focused on statistical models like visualizations include heat maps etc.
6. scikit-learn
GitHub Source, Star: 27106, Commits: 22684, Contributors: 1046
scikit-learn is a Python module for machine learning built on top of SciPy. It is designed for specific functionalities like image processing and machine learning facilitation.