Pandas

Pandas For Data Analysis

Ultimate Guide for Python Engineer

  • school Intro - What is Pandas?

    In this walkthrough, you will learn how to analyze and visualize data using Pandas. You will also get familiar with various tips and tricks on how to use Pandas for Data Analysis and Data Science Project.


    According to Wiki,Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license

  • school History of Pandas and Its Creator

    Pandas was originally created by Wes Mckinney.Wes McKinney started building what would become pandas at AQR Capital while he was a researcher there from 2007 to 2010

    Wes is an American software developer. He is the creator and "Benevolent Dictator for Life" (BDFL) of the open-source Pandas package for data analysis in Python and has also authored two versions of the reference book Python for Data Analysis. As a bussinessman He was the CEO and founder of technology startup Datapad.

    In 2007, Wes McKinney graduated from MIT with a B.S. After which he started working on Pandas. In 2010, he began a Ph.D program in Statistics at Duke University, but went on leave in 2011.

    You can check out more from his website @ https://wesmckinney.com/

    • In 2008, pandas development began at AQR Capital Management. By the end of 2009 it had been open sourced, and is actively supported today by a community of like-minded individuals around the world who contribute their valuable time and energy to help make open source pandas possible.
    • Pandas 1.0 was released in 2008 and was a major revision of the language that is not completely backward-compatible.

    Mission of Pandas

    The aims of pandas is to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.

  • school Benefits & Usefulness of Pandas

    Before Pandas was created, developers and data scientist used to use other tools such as Excel, Pascal and R -Dataframe for performing analysis. However the creation of Pandas has brought so many benefits to engineers. Some of these benefits include the following.


    Usefulness
    1. Data cleansing
    2. Data fill
    3. Data normalization
    4. Merges and joins
    5. Data visualization
    6. Statistical analysis
    7. Data inspection
    8. Loading and saving data
    9. Vast Libraries and Package Ecosystem
    10. Open Source
    11. Great Documentation and Numerous Tutorials to learn from
    12. Large Community
    13. It is very old (almost 12 years)
    Disadvantages
    1. No Ideal for 3D matrices

  • school Installing - Pandas

    To install Pandas, you may need to download the most recent stable version. This is the one with the highest number that isn't marked as an alpha or beta release.


    Installation Guide For Packages

    Via the pip and PyPy & Conda platforms you can install Pandas on your system

    PyPI
    Using Pip

    To install Pandas you can use pip3 or pip or conda as below

    
              	pip install pandas
              
    Pip Install
  • school Pandas and Jupyter Notebook

    Pandas like any Python Package can be used inside Jupyter Notebooks as well as any IDE or REPL such as below


    Python IDE - Interactive Development Environment
    1. VsCode
    2. Sublime-Text
    3. PyCharm
    4. Atom & Bracket
    5. Notepad ++
    Python REPL & Notebooks -
    1. IPython
    2. BPython
    3. Jupyter Notebooks
    4. JupyterLab
    5. etc

  • school Getting Started with Pandas

    Let us start with how to use Pandas to perform data analysis from end to end. By the end of this you will have an indepth understanding of Pandas in relation to Data Analysis

    To work with Pandas you will need to import it. There is a common convention used by Data Science People when importing pandas. The convention is to import it as below

    
    import pandas as pd
              

    You can check for the version via the `.__version__`

    
    import pandas as pd
    pd.__version__
              

    In summary

    Pip Install
  • map Reading Various Data Format

    One of the features that makes pandas standouts is its ability to read various file format ranging from CSV to Parquet. Let us see how to read the various file formats.Pandas provides a simple API to read the respective file formats.

    The format goes with the `pd.read_*` where * is the file format type such as csv,excel,html,parquet,etc

    In summary

    Pd Read

    Source:Pandas Official Website

    File Formats

    Certain file formats may require some dependencies which in most cases would be installed during your initial installation of pandas.

  • map Reading CSV Files

    CSV stands for Comma Separated Values. It is actually a text file that has a specific format which allows data to be saved in a table structured format.It uses comma `,` to separate the data. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.

    It is one of the most commonest and beginner friendly file formats to work with just like JSON

    As a delimited file, we can change the separator or delimiter to other special characters other than comma. So in case we change it from comma to white space or tab space it becomes a WSV or TSV file format

    Pandas allows you to read different delimited file formats or variants of CSV files by specifying the separator or delimiter in the required params

    In Summary

    Pd Read

    As you can see from above, Pandas read_* has several optional params for several use case. In case you want to read tab or whitespace or semi-colon separated file format. You can modify either the `sep=','` or `delimiter=''` as per your need

  • school Pandas Basics

    With Pandas you can preview the file using the head, tail option just as you would within a linux terminal.

    1. df.head(): view the first n datapoints
    2. df.head(10): view the first 10 rows/datapoints
    3. df.tail(): view the last n datapoints

  • map Writing to Various Data Format

    With Pandas you can write or save your dataframe to various file format ranging from CSV to Parquet. Let us see how to write to the various file formats.Pandas provides a simple API to read the respective file formats.

    The format goes with the `pd.().to_*` where * is the file format type such as csv,excel,html,parquet,etc

    In summary

    File Formats

    Source:Pandas Official Website

    Certain file formats may require some dependencies which in most cases would be installed during your initial installation of pandas.

    The `.to_*() also has several arguments and optional params per your needs

  • map Selecting Rows and Columns


  • map Reshaping Data with Pandas


  • school Statistics with Pandas


Tasks

Practical Task on Pandas

  • map Coming Soon


Info

starPandas,PyPolars,PySpark

Back
Pandas For Data Analysis
  • layers Goal :
  • person Tasks :
  • access_time Time
insert_chart