Python CheatSheets

Quick Guide for Python Engineer

  • receipt What is Python?

    Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.

  • school Cheatsheets
  • school Data Science Python Tools


  • school Awesome Python Resources


Numpy Cheatsheets

A Quick Learning Guide on Numpy

  • receipt What is Numpy?

    NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

    At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.

    Source:https://numpy.org/doc/stable/user/whatisnumpy.html

  • school Numpy - Basics

    Basics

    One of the most commonly used functions of NumPy are NumPy arrays: The essential difference between lists and NumPy arrays is functionality and speed. lists give you basic operation, but NumPy adds FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc.
    The most important difference for data science is the ability to do element-wise calculations with NumPy arrays.

    axis 0 always refers to row
    axis 1 always refers to column

    Operator Description Documentation
    np.array([1,2,3]) 1d array link
    np.array([(1,2,3),(4,5,6)]) 2d array see above
    np.arange(start,stop,step) range array link

    Placeholders

    Operators Description Documentation
    np.linspace(0,2,9) Add evenly spaced values btw interval to array of length link
    np.zeros((1,2)) Create and array filled with zeros link
    np.ones((1,2)) Creates an array filled with ones link
    np.random.random((5,5)) Creates random array link
    np.empty((2,2)) Creates an empty array link

    Examples

    import numpy as np
    
            # 1 dimensional
            x = np.array([1,2,3])
            # 2 dimensional
            y = np.array([(1,2,3),(4,5,6)])
    
            x = np.arange(3)
            >>> array([0, 1, 2])
    
            y = np.arange(3.0)
            >>> array([ 0.,  1.,  2.])
    
            x = np.arange(3,7)
            >>> array([3, 4, 5, 6])
    
            y = np.arange(3,7,2)
            >>> array([3, 5])

  • receipt Numpy - Arrays

    Array

    Array Properties

    Syntax Description Documentation
    array.shape Dimensions (Rows,Columns) link
    len(array) Length of Array link
    array.ndim Number of Array Dimensions link
    array.size Number of Array Elements link
    array.dtype Data Type link
    array.astype(type) Converts to Data Type link
    type(array) Type of Array link

    Copying/Sorting

    Operators Descriptions Documentation
    np.copy(array) Creates copy of array link
    other = array.copy() Creates deep copy of array see above
    array.sort() Sorts an array link
    array.sort(axis=0) Sorts axis of array see above

    Examples

    import numpy as np
            # Sort sorts in ascending order
            y = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
            y.sort()
            print(y)
            >>> [ 1  2  3  4  5  6  7  8  9  10]

    Array Manipulation Routines

    Adding or Removing Elements

    Operator Description Documentation
    np.append(a,b) Append items to array link
    np.insert(array, 1, 2, axis) Insert items into array at axis 0 or 1 link
    np.resize((2,4)) Resize array to shape(2,4) link
    np.delete(array,1,axis) Deletes items from array link

    Example

    import numpy as np
            # Append items to array
            a = np.array([(1, 2, 3),(4, 5, 6)])
            b = np.append(a, [(7, 8, 9)])
            print(b)
            >>> [1 2 3 4 5 6 7 8 9]
    
            # Remove index 2 from previous array
            print(np.delete(b, 2))
            >>> [1 2 4 5 6 7 8 9]

    Combining Arrays

    Operator Description Documentation
    np.concatenate((a,b),axis=0) Concatenates 2 arrays, adds to end link
    np.vstack((a,b)) Stack array row-wise link
    np.hstack((a,b)) Stack array column wise link

    Example

    import numpy as np
            a = np.array([1, 3, 5])
            b = np.array([2, 4, 6])
    
            # Stack two arrays row-wise
            print(np.vstack((a,b)))
            >>> [[1 3 5]
                 [2 4 6]]
    
            # Stack two arrays column-wise
            print(np.hstack((a,b)))
            >>> [1 3 5 2 4 6]

    Splitting Arrays

    Operator Description Documentation
    numpy.split() link
    np.array_split(array, 3) Split an array in sub-arrays of (nearly) identical size link
    numpy.hsplit(array, 3) Split the array horizontally at 3rd index link

    Example

    # Split array into groups of ~3
            a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
            print(np.array_split(a, 3))
            >>> [array([1, 2, 3]), array([4, 5, 6]), array([7, 8])]

    Shaping Arrays

    TODO
    Operator Description Documentation
    other = ndarray.flatten() Flattens a 2d array to 1d link
    numpy.flip() Flips order of elements in 1D array
    np.ndarray[::-1] Same as above
    reshape
    squeeze
    expand_dims
  • school Numpy - Miscellaneous on Arrays

    Misc

    Operator Description Documentation
    other = ndarray.flatten() Flattens a 2d array to 1d link
    array = np.transpose(other)
    array.T
    Transpose array link
    inverse = np.linalg.inv(matrix) Inverse of a given matrix link

    Example

    # Find inverse of a given matrix
            >>> np.linalg.inv([[3,1],[2,4]])
            array([[ 0.4, -0.1],
                   [-0.2,  0.3]])
  • add_to_queue Numpy - Maths Operations

    Mathematics

    Operations

    Operator Description Documentation
    np.add(x,y)
    x + y
    Addition link
    np.substract(x,y)
    x - y
    Subtraction link
    np.divide(x,y)
    x / y
    Division link
    np.multiply(x,y)
    x @ y
    Multiplication link
    np.sqrt(x) Square Root link
    np.sin(x) Element-wise sine link
    np.cos(x) Element-wise cosine link
    np.log(x) Element-wise natural log link
    np.dot(x,y) Dot product link
    np.roots([1,0,-4]) Roots of a given polynomial coefficients link

    Remember: NumPy array operations work element-wise.

    Example

    # If a 1d array is added to a 2d array (or the other way), NumPy
           # chooses the array with smaller dimension and adds it to the one
           # with bigger dimension
           a = np.array([1, 2, 3])
           b = np.array([(1, 2, 3), (4, 5, 6)])
           print(np.add(a, b))
           >>> [[2 4 6]
                [5 7 9]]
                
           # Example of np.roots
           # Consider a polynomial function (x-1)^2 = x^2 - 2*x + 1
           # Whose roots are 1,1
           >>> np.roots([1,-2,1])
           array([1., 1.])
           # Similarly x^2 - 4 = 0 has roots as x=±2
           >>> np.roots([1,0,-4])
           array([-2.,  2.])

    Comparison

    Operator Description Documentation
    == Equal link
    != Not equal link
    < Smaller than link
    > Greater than link
    <= Smaller than or equal link
    >= Greater than or equal link
    np.array_equal(x,y) Array-wise comparison link

    Example

    # Using comparison operators will create boolean NumPy arrays
           z = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
           c = z < 6
           print(c)
           >>> [ True  True  True  True  True False False False False False]
  • school Numpy - Basic Statistics

    Basic Statistics

    Operator Description Documentation
    np.mean(array) Mean link
    np.median(array) Median link
    array.corrcoef() Correlation Coefficient link
    np.std(array) Standard Deviation link

    Example

    # Statistics of an array
           a = np.array([1, 1, 2, 5, 8, 10, 11, 12])
    
           # Standard deviation
           print(np.std(a))
           >>> 4.2938910093294167
    
           # Median
           print(np.median(a))
           >>> 6.5

    More

    Operator Description Documentation
    array.sum() Array-wise sum link
    array.min() Array-wise minimum value link
    array.max(axis=0) Maximum value of specified axis
    array.cumsum(axis=0) Cumulative sum of specified axis link

  • sort Numpy - Slicing and Sorting

    Slicing and Subsetting

    Operator Description Documentation
    array[i] 1d array at index i link
    array[i,j] 2d array at index[i][j] see above
    array[i<4] Boolean Indexing, see Tricks see above
    array[0:3] Select items of index 0, 1 and 2 see above
    array[0:2,1] Select items of rows 0 and 1 at column 1 see above
    array[:1] Select items of row 0 (equals array[0:1, :]) see above
    array[1:2, :] Select items of row 1 see above
    [comment]: <> ( array[1,...] equals array[1,:,:]
    array[ : :-1] Reverses array see above

    Examples

    b = np.array([(1, 2, 3), (4, 5, 6)])
    
           # The index *before* the comma refers to *rows*,
           # the index *after* the comma refers to *columns*
           print(b[0:1, 2])
           >>> [3]
    
           print(b[:len(b), 2])
           >>> [3 6]
    
           print(b[0, :])
           >>> [1 2 3]
    
           print(b[0, 2:])
           >>> [3]
    
           print(b[:, 0])
           >>> [1 4]
    
           c = np.array([(1, 2, 3), (4, 5, 6)])
           d = c[1:2, 0:2]
           print(d)
           >>> [[4 5]]

  • school Numpy - Misc

    Tricks

    # Index trick when working with two np-arrays
           a = np.array([1,2,3,6,1,4,1])
           b = np.array([5,6,7,8,3,1,2])
    
           # Only saves a at index where b == 1
           other_a = a[b == 1]
           #Saves every spot in a except at index where b != 1
           other_other_a = a[b != 1]
    import numpy as np
           x = np.array([4,6,8,1,2,6,9])
           y = x > 5
           print(x[y])
           >>> [6 8 6 9]
    
           # Even shorter
           x = np.array([1, 2, 3, 4, 4, 35, 212, 5, 5, 6])
           print(x[x < 5])
           >>> [1 2 3 4 4]

  • school Credits
Pandas CheatSheets

A Quick Guide on Pandas

  • receipt What is Pandas?

    Pandas is the defactor data analysis and munging tool used by Data Scientist and other Data workers for small to medium size datasets

  • school Pandas - Loading Data - CSV

    With Pandas you can load or read data contained in a CSV, comma separated file


                
        df = pd.read_csv('file.csv')# often works
        df = pd.read_csv('file.csv',sep=':',na_values = ['na', '-', '.', '']) #
                
            

    Get data from inline CSV text to a DataFrame
          
    from io import StringIO
    data = """, Pet, Cuteness, Desirable
    row-1, dog, 8.7, True
    row-2, cat, 9.5, True
    row-3, bat, 2.6, False"""
    df = pd.read_csv(StringIO(data),header=0, index_col=0,skipinitialspace=True)
     
    
  • school Pandas - Loading Data - Excel

    Load DataFrames from a Microsoft Excel file

      
    # Each Excel sheet in a Python dictionary
    workbook = pd.ExcelFile('file.xlsx')
    d = {} # start with an empty dictionary
    for sheet_name in workbook.sheet_names:
        df = workbook.parse(sheet_name)
        d[sheet_name] = df
    
  • school Pandas - Loading Data - Database

    In order to read data with pandas from a database you may need a middleware library for that specific database specifically sqlalchemy

    Reading From MySQL
        
    import pymysql
    from sqlalchemy import create_engine
    
    engine = create_engine('mysql+pymysql://'+'USER:PASSWORD@HOST/DATABASE')
    df = pd.read_sql_table('table', engine)
        
    

    Reading From SQLite
        
    import sqlite3
    import pandas as pd
    # Create your connection.
    conn = sqlite3.connect('file.db')
    
    df = pd.read_sql_query("SELECT * FROM table_name", conn)
        
    
  • school Pandas - Creating DataFrames
    How to Create Data in Series then combine into a DataFrame
    
    # Example 1 ...
    s1 = Series(range(6))
    s2 = s1 * s1
    s2.index = s2.index + 2# misalign indexes
    df = pd.concat([s1, s2], axis=1)
    # Example 2 ...
    s3 = Series({'Tom':1, 'Dan':4, 'Har':9})
    s4 = Series({'Tom':3, 'Dan':2, 'Mar':5})
    df = pd.concat({'A':s3, 'B':s4 }, axis=1)
    How to Create a DataFrame from a Python dictionary
    
    # default --- assume data is in columns
    df = DataFrame({
    'col0' : [1.0, 2.0, 3.0, 4.0],
    'col1' : [100, 200, 300, 400]
    })
    How to Create a DataFrame from data in a Python dictionary
    
    # --- use helper method for data in rows
    df = DataFrame.from_dict({ # data by row
    # rows as python dictionaries
    'row0' : {'col0':0, 'col1':'A'},
    'row1' : {'col0':1, 'col1':'B'}
    }, orient='index')
    df = DataFrame.from_dict({ # data by row
    # rows as python lists
    'row0' : [1, 1+1j, 'A'],
    'row1' : [2, 2+2j, 'B']
    }, orient='index')
    
    How to Create fake data (useful for testing)
    
    df = DataFrame(np.random.rand(50,5))
    
  • school Pandas - Saving DataFrames
    Saving a DataFrame to a CSV file
    
             df.to_csv('name.csv', encoding='utf-8')
    Saving DataFrames to an Excel Workbook
    
             from pandas import ExcelWriter
             writer = ExcelWriter('filename.xlsx')
             df1.to_excel(writer,'Sheet1')
             df2.to_excel(writer,'Sheet2')
             writer.save()
    Saving a DataFrame to MySQL
    
             import pymysql
             from sqlalchemy import create_engine
             e = create_engine('mysql+pymysql://' +
             'USER:PASSWORD@HOST/DATABASE')
             df.to_sql('TABLE',e, if_exists='replace')
    Saving to Python objects
    
             d = df.to_dict() # to dictionary 
             str = df.to_string() # to string 
             m = df.as_matrix() # to numpy matrix
    Saving to JSON
    
             df.to_json('filename.json') # to JSON
             df.to_json('filename.json',orient='records') # to records 
            
    Saving to Parquet
    
             df.to_parquet('filename.parquet.gzip',compression='gzip') # to parquet
            
  • school Pandas - Basics Preview of Dataset

    With Pandas you can get a brief overview of the dataset


                 
    >>>df.info() # index & data types
    # Preview the First N Rows
    >>>df.head(n)
    # Preview the Last N Rows
    >>>df.tail(n)
    # Get Descriptive Summary
    >>> df.describe() 
    # Get DataTypes
    >>> df.dtypes
    # Get Column Names
    >>> df.columns
    # Get Dimensions/Shape of DF
    >>> df.shape 
                   
                 
             
  • school Pandas - Selecting Columns

    With Pandas you can select columns


    Select Columns Using Column Names/Labels
                 
    # Select Single Columns with Specific Name
    >>>df['col_name']  # returns Series
    >>>df.col_name # returns Series
    
    # Selecting Multiple Columes using Specific Names
    >>>feature_cols = ['TV','Radio','Newspaper']
    >>>x = df[feature_cols]
    # Alternate Method
    >>>df[['TV','Radio','Newspaper']]
    
    # Differences
    s = df['col_name']  # returns Series
    df.col_name # returns Series
    df = df[['col_name']] # return DataFrame
    df = df[['L1', 'L2']] # select with list
    df = df[index] # select with index
    df = df[s] #select with Series
    
    # Select Columns Whose Names Matches A Pattern
    >>>df.filter(regex='your_pattern')
     
             

    Note: the difference in return type with the first two examples above based on argument type (scalar vs list).

    Select Columns Using Iloc (Index Location) & Conditions
    
    df.iloc[:,:2] # Select the first 2 columns
    # by column labels
    df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
    # conditional
    df.filter(like='data')
    df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column
                   
                 
             
  • school Pandas - Selecting Rows

    With Pandas you can select rows of interest


    Select Rows Using Iloc (Index Location) & Conditions
    
    df.iloc[0] # Select the first row of DataFrame
    df.iloc[-1] # Select the last row of DataFrame
    df.iloc[1:5] # Select the row 2 to 5 of DataFrame
    # by column labels
    df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
    # conditional
    df.filter(like='data')
    df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column
                   
                 
             
                 
                   
                 
             
  • school Pandas - Selecting Rows & Columns Summary

    Use df.loc and df.iloc to select only rows or only columns or both

    [row,column]:: First index selects rows and the second selects columns


    Select Rows Using Iloc (Index Location) & Conditions
    
    df.iloc[0] # Select the first row of DataFrame
    df.iloc[10:30] # Select the row 10 to 30 of DataFrame
    df.iloc[:,[2,5,7] # Select all the rows of columns 2,5 and 7
       
             
    Select Rows & Columns Using Loc
    
    df.loc[:,'colA'] # Select all rows of column colA
    df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
    df.loc[:,'colA':'colD'] # Select all columns between colA and ColD.
    df.loc[:,[2,5,7] # Select all the rows of columns 2,5 and 7
    
    
    # conditional
    df.filter(like='data')
    df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column
                   
                 
             
                 
                   
                 
             
  • school Pandas - Conditionals & Filtering

    With Pandas you can get data via conditions


                 
    # Select rows meeting a condition 
    df.loc[df['colA'] > 20]
    df.loc[(df['colA'] > 20) & (df['colB'].str.startswith("a"))]
    # Using Query
    df.query('colA > 20)
                 
             
  • school Pandas - Apply & ApplyMap

    With Pandas you can get apply user defined functions to the dataset


                 
    # Apply A Fxn: Method 1
    df['col'].apply(your_fxn)
    # Apply A Fxn: Method 2
    df['col'].apply(lambda x: fxn(x))
                 
             
  • school Pandas - Subset & Subselection

    With Pandas you can get a brief overview of the dataset


    Sampling DataFrame
                 
    # Select 50% fraction of dataset
    df.sample(frac=0.5)
    # Select N number of rows of dataset
    df.sample(n=20)
                 
             
    Sampling Largest & Smallest Values
                 
    # Select nLargest Value of a Column dataset & Order them
    df.nlargest(n,'column_name')
    df.nlargest(12,'ColA')
    # Select nsmallest Value of a Column dataset & Order them
    df.nsmallest(n,'column_name')
    df.nsmallest(12,'ColA')
                 
             
  • school Pandas - Working with Missing Values

    Missing values are usually represented as np.nan,NaN,null,""


    Checking For Missing Values
                 
    # Check for Missing Values
    df.isnull()
    df.isna()
    # Check For the total number of missing values
    df.isnull().sum()
    df.isna().sum()   
    # Check for Missing Values in A Column
    df['colA'].isnull()
    df['colA'].isna()
                 
             
    Handling Missing Values
                 
    # Drop All Missing Values
    df.dropna()
    # Fill Missing Values with Your 'value'
    df.fillna('value')
                 
             
  • school Pandas -

    With Pandas you can get a brief overview of the dataset


                 
                   
                 
             
ScikitLearn CheatSheets

A Quick Guide on Scikit-Learn

  • receipt What is Scikit-Learn?

    Scikit-Learn (sklearn for short) is a python machine learning framework for performing classification,regression and unsupervised ML projects

    It is simple and efficient tool for predictive data analysis built on NumPy, SciPy, and matplotlib

    Official Site:https://scikit-learn.org/stable/

  • school Sklearn - Basics

    Sklearn follows a common API

    It comprises 3 main API for performing every form of ML activity. These include

    1. Estimators(Data to Model stage): A function that takes in data and produces predictive model.All the ML Algorithms
    2. Transformer(Data to Data Stage):A function that takes in data and augments it to produce useful data
        Types of Transformers
      • Scalers: StandardScaler,MinMaxScaler,Normalizer,etc
      • Vectorizers: CountVectorizer,TfidfVectorizer,etc
      • Tokenizers:
      • Encoders: LabelEncoder,OneHotEncoder,etc
    3. Others

    Basic Overview of A Task
                
    # Import Pkg
    from sklearn.dummy import DummyClassifier
    
    # Fit the model on the wine dataset and return the model score
    dummy_clf = DummyClassifier(strategy="most_frequent", random_state=0)
    dummy_clf.fit(X, y)
    
    # Check Accuracy
    dummy_clf.score(X, y)     
    
    # Make Prediction
    dummy_clf.predict(X)           
                
            
  • school Sklearn - Classification Problems


    Sample Classifier
                
    #Load Data
    df = pd.read_csv("data.csv")
    
    # Prepare Data
    Xfeatures = df['features']
    ylabels = df['labels']
    
    # Split Dataset
    X_train, X_test, y_train, y_test = model_selection.train_test_split(Xfeatures, ylabels, random_state=42)
    
    # Import Pkg
    from sklearn.submodule import EstimatorClassifier
    
    # Fit the model on the wine dataset and return the model score
    clf = EstimatorClassifier(strategy="most_frequent", random_state=0)
    clf.fit(X_train, y_train)
    
    # Check Accuracy
    clf.score(X_test, y_test)     
    
    # Make Prediction
    dummy_clf.predict(X_test)           
                
            
  • school Sklearn - Classification Metrics


  • school Sklearn - Regression Problems


    Sample Regressor
                
    #Load Data
    df = pd.read_csv("data.csv")
    
    # Prepare Data
    Xfeatures = df['features']
    ylabels = df['continuous_values']
    
    # Split Dataset
    X_train, X_test, y_train, y_test = model_selection.train_test_split(Xfeatures, ylabels, random_state=42)
    
    # Import Pkg
    from sklearn.submodule import EstimatorRegressor
    
    # Fit the model on the dataset and return the model score
    reg = EstimatorRegressor.Ridge(alpha=.5)
    >>> reg.fit(X_train,y_train)
    # Metrics
    >>> reg.coef_
    array([0.34545455, 0.34545455])
    >>> reg.intercept_
    # Make Prediction
    >>reg.predict(X_test)           
                
            
  • school Sklearn - Regression Metrics


  • school Sklearn - Unsupervised ML - KMeans

    The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.


    Sample of KMeans
    
    >>> from sklearn.cluster import KMeans
    >>> import numpy as np
    >>> X = np.array([[1, 2], [1, 4], [1, 0],
    ...               [10, 2], [10, 4], [10, 0]])
    >>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
    >>> kmeans.labels_
    array([1, 1, 1, 0, 0, 0], dtype=int32)
    >>> kmeans.predict([[0, 0], [12, 3]])
    array([1, 0], dtype=int32)
    >>> kmeans.cluster_centers_
    array([[10.,  2.],
           [ 1.,  2.]])
                  
              
  • school Sklearn - Unsupervised ML - Hierarchical Clustering

    Hierarchical clustering is a general family of clustering algorithms that build nested clusters by merging or splitting them successively. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers all the samples, the leaves being the clusters with only one sample


    Sample of AgglomerativeClustering

    The AgglomerativeClustering object performs a hierarchical clustering using a bottom up approach: each observation starts in its own cluster, and clusters are successively merged together.

    
    >>> from sklearn.cluster import AgglomerativeClustering
    >>> import numpy as np
    >>> X = np.array([[1, 2], [1, 4], [1, 0],
    ...               [4, 2], [4, 4], [4, 0]])
    >>> clustering = AgglomerativeClustering().fit(X)
    >>> clustering
    AgglomerativeClustering()
    >>> clustering.labels_
    array([1, 1, 1, 0, 0, 0])
                  
              
    >
  • school Sklearn - Neural Networks (Supervised ML)

    You can use Sklearn for Deep Learning Task via the neural_network submodule


    
    >>> X = [[0., 0.], [1., 1.]]
    >>> y = [0, 1]
    # Init Estimator
    >>> mlp_clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
    ...                     hidden_layer_sizes=(5, 2), random_state=1)
    ...
    # Fit Data
    >>> mlp_clf.fit(X, y)
    MLPClassifier(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,
                  solver='lbfgs')
    # Make Prediction
    >>>mlp_clf.predict([[2., 2.], [-1., -2.]])
                  
              
    >

  • school Awesome Data Science Resources


  • school Awesome Data Science Resources


  • school Awesome Data Science Resources


Tensorflow CheatSheets

A Quick Guide on Tensorflow

  • receipt Tensorflow

    TensorFlow is an open-source software library for highperformance numerical computation. Its flexible architecture enables to easily deploy computation across a variety of platforms (CPUs, GPUs, and TPUs), as well as mobile and edge devices, desktops, and clusters of servers. TensorFlow comes with strong support for machine learning and deep learning

  • school Tensorflow - Basics

    TensorFlow 2.x Cheat Sheet

    Table of Contents

    • Layers
    • Models
    • Activation Functions
    • Optimizers
    • Loss Functions
    • Hyperparameters
    • Preprocessing
    • Metrics
    • Visualizations
    • Callbacks
    • Transfer Learning
    • Overfitting
    • TensorFlow Data Services
    • Examples
  • school TF - Layers

    Layers

    Layers Code Usage
    Dense tf.keras.layers.Dense(units, activation, input_shape) Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer.
    Flatten tf.keras.layers.Flatten() Flattens the input.
    Conv2D tf.keras.layers.Conv2D(filters, kernel_size, activation, input_shape) Convolution layer for two-di­men­sional data such as images.
    MaxPooling2D tf.keras.layers.MaxPool2D(pool_size) Max pooling for two-di­men­sional data.
    Dropout tf.keras.layers.Dropout(rate) The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting.
    Embedding tf.keras.layers.Embedding(input_dim, output_dim, input_length) The Embedding layer is initialized with random weights and will learn an embedding for all of the words in the dataset.
    GlobalAveragePooling1D tf.keras.layers.GlobalAveragePooling1D() Global average pooling operation for temporal data.
    Bidirectional LSTM tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units, return_sequence)) Bidirectional Long Short-Term Memory layer
    Conv1D tf.keras.layers.Conv1D(filters, kernel_size, activation, input_shape) Convolution layer for one-dimentional data such as word embeddings.
    Bidirectional GRU tf.keras.layers.Bidirectional(tf.keras.layers.GRU(units)) Bidirectional Gated Recurrent Unit
    Simple RNN tf.keras.layers.SimpleRNN(units, activation, return sequences, input_shape) Fully-connected RNN where the output is to be fed back to input.
    Lambda tf.keras.layers.Lambda(function) Wraps arbitrary expressions as a Layer object.
  • school TF - Models

    Models

    Code Usage
    model = tf.ker­as.S­eq­uen­tia­l(l­ayers) Sequential groups a linear stack of layers into a tf.ker­as.M­odel.
    model.co­mpi­le(­opt­imizer, loss, metrics) Configures the model for training.
    history = model.fit(x, y, epoch) Trains the model for a fixed number of epochs (itera­tions on a dataset).
    history = model.fit_generator(train_generator, steps_per_epoch, epochs, validation_data, validation_steps) Fits the model on data yielded batch-­by-­batch by a Python generator.
    model.ev­alu­ate(x, y) Returns the loss value & metrics values for the model in test mode.
    model.pr­edi­ct(x) Generates output predic­tions for the input samples.
    model.su­mma­ry() Prints a string summary of the network.
    model.save(path) Saves a model as a TensorFlow SavedModel or HDF5 file.
    model.stop_training Stops training when true.
    model.save('path/my_model.h5') Save a model in HDF5 format.
    new_model = tf.keras.models.load_model('path/my_model.h5') Reload a fresh Keras model from the saved model.
  • school TF - Example of Neural Network Model

    A Simple Text Classification NN Model


                
    # Building the Model
    model = Sequential()
    model.add(layers.Embedding(input_dim=vocab_size,output_dim=50,input_length=maxlen))
    model.add(layers.Flatten())
    model.add(layers.Dense(10,activation='relu'))
    model.add(layers.Dense(5,activation='softmax'))
    # Last layer is the output layer of 5 classes
    
    # Compile the model
    model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])
    
    # Check Summary of Model
    model.summary()
                
              
  • school TF - Activation Functions

    Activation Functions

    Name Usage
    relu the default activation for hidden layers.
    sigmoid binary classi­fic­ation.
    tanh faster conver­gence than sigmoid.
    softmax multiclass classi­fic­ation.
  • school TF - Optimizers

    Optimizers

    Name Usage
    Adam Adam combines the good properties of Adadelta and RMSprop and hence tend to do better for most of the problems.
    SGD Stochastic gradient descent is very basic and works well for shallow networks.
    AdaGrad Adagrad can be useful for sparse data such as tf-idf.
    AdaDelta Extension of AdaGrad which tends to remove the decaying learning Rate problem of it.
    RMSprop Very similar to AdaDelta.
  • school TF - Loss Functions

    Loss Functions

    Name Usage
    MeanSquaredError Default loss function for regression problems.
    MeanSquaredLogarithmicError For regression problems with large spread.
    MeanAbsoluteError More robust to outliers.
    BinaryCrossEntropy Default loss function to use for binary classi­fic­ation problems.
    Hinge It is intended for use with binary classi­fic­ation where the target values are in the set {-1, 1}.
    SquaredHinge If using a hinge loss does result in better perfor­mance on a given binary classi­fic­ation problem, is likely that a squared hinge loss may be approp­riate.
    CategoricalCrossEntropy Default loss function to use for multi-­class classi­fic­ation problems.
    SparseCategoricalCrossEntropy Sparse cross-­entropy addresses the one hot encoding frustr­ation by performing the same cross-­entropy calcul­ation of error, without requiring that the target variable be one hot encoded prior to training.
    KLD KL divergence loss function is more commonly used when using models that learn to approx­imate a more complex function than simply multi-­class classi­fic­ation, such as in the case of an autoen­coder used for learning a dense feature repres­ent­ation under a model that must recons­truct the original input.
    Huber Less sensitive to outliers
  • school Awesome Data Science Resources


  • school TF - Hyperparameters

    Hyperparameters

    Parameter Tips
    Hidden Neurons The size of the output layer, and 2/3 the size of the input layer, plus the size of the output layer.
    Learning Rate [0.1, 0.01, 0.001, 0.0001]
    Momentum [0.5, 0.9, 0.99]
    Batch Size Small values give a learning process that converges quickly at the cost of noise in the training process. Large values give a learning process that converges slowly with accurate estimates of the error gradient. The typical sizes are [32, 64, 128, 256, 512]
    Conv2D Filters Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc.
    Kernel Size (3, 3)
    Pool Size (2, 2)
    Steps per Epoch sample_size // batch_size
    Epoch Use callbacks
    Embedding Dimensions vocab_size ** 0.25
    Truncating post
    OOV Token <OOV>
  • school TF - Working with Textual Data

    Tokenizer, Text-to-sequence & Padding

    
             import tensorflow as tf
             from tensorflow import keras
    
    
             from tensorflow.keras.preprocessing.text import Tokenizer
             from tensorflow.keras.preprocessing.sequence import pad_sequences
    
             sentences = [
                 'I love my dog',
                 'I love my cat',
                 'You love my dog!',
                 'Do you think my dog is amazing?'
             ]
    
             tokenizer = Tokenizer(num_words = 100, oov_token="<OOV>")
    
             # Key value pair (word: token)
             tokenizer.fit_on_texts(sentences)
             word_index = tokenizer.word_index
    
             # Lists of tokenized sentences
             sequences = tokenizer.texts_to_sequences(sentences)
    
             # Padded tokenized sentences
             padded = pad_sequences(sequences, maxlen=5)
    
             print("\nWord Index = " , word_index)
             print("\nSequences = " , sequences)
             print("\nPadded Sequences:")
             print(padded)
         
     

    One-hot Encoding

    ys = tf.keras.utils.to_categorical(labels, num_classes=3)
  • school TF - Working with Image Data

    Preprocessing

    ImageDataGenerator

    
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    # Image augmentation
    train_datagen = ImageDataGenerator(
          rescale=1./255,
          rotation_range=40,
          width_shift_range=0.2,
          height_shift_range=0.2,
          shear_range=0.2,
          zoom_range=0.2,
          horizontal_flip=True,
          fill_mode='nearest')
    validation_datagen = ImageDataGenerator(rescale=1/255)
    
    # Flow training images in batches of 128 using train_datagen generator
    train_generator = train_datagen.flow_from_directory(
            '/tmp/horse-or-human/',  # This is the source directory for training images
            target_size=(300, 300),  # All images will be resized to 300x300
            batch_size=128,
            # Since we use binary_crossentropy loss, we need binary labels
            class_mode='binary')
    
    # Flow training images in batches of 128 using train_datagen generator
    validation_generator = validation_datagen.flow_from_directory(
            '/tmp/validation-horse-or-human/',  # This is the source directory for training images
            target_size=(300, 300),  # All images will be resized to 300x300
            batch_size=32,
            # Since we use binary_crossentropy loss, we need binary labels
            class_mode='binary')">
    
  • school TF - Visualizing Training Loss

    Get the Structure of a Model

                 
    # Plot Model
    from tensorflow.keras.utils import plot_model
    plot_model(model, show_shapes=True)
    
                 
             

                  
    history = model.fit(x_train_seq, y_train,epochs=5,verbose=False,validation_data=(x_test_seq, y_test),batch_size=10)
    history.history
    
    # Function to Plot
    def plot_history(history):
        acc = history.history['accuracy']
        val_acc = history.history['val_accuracy']
        loss = history.history['loss']
        val_loss = history.history['val_loss']
        x = range(1, len(acc) + 1)
    
        plt.figure(figsize=(12, 5))
        plt.subplot(1, 2, 1)
        plt.plot(x, acc, 'b', label='Training acc')
        plt.plot(x, val_acc, 'r', label='Validation acc')
        plt.title('Training and validation accuracy')
        plt.legend()
        plt.subplot(1, 2, 2)
        plt.plot(x, loss, 'b', label='Training loss')
        plt.plot(x, val_loss, 'r', label='Validation loss')
        plt.title('Training and validation loss')
        plt.legend()
    
                  
              
  • school TF - CallBacks
                 
                     Learning Rate Scheduler
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"), 
        tf.keras.layers.Dense(10, activation="relu"), 
        tf.keras.layers.Dense(1)
    ])
    
    lr_schedule = tf.keras.callbacks.LearningRateScheduler(
        lambda epoch: 1e-8 * 10**(epoch / 20))
    optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
    model.compile(loss="mse", optimizer=optimizer)
    history = model.fit(dataset, epochs=100, callbacks=[lr_schedule], verbose=0)
    End of Training Cycles
    
    import tensorflow as tf
    
    class myCallback(tf.keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>0.6):
          print("\nReached 60% accuracy so cancelling training!")
          self.model.stop_training = True
    
    mnist = tf.keras.datasets.fashion_mnist
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    
    callbacks = myCallback()
    
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(512, activation=tf.nn.relu),
      tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
    model.compile(optimizer=tf.optimizers.Adam(),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])
                 
             

  • school TF - Transfer Learning


        
    import os
    
    from tensorflow.keras import layers
    from tensorflow.keras import Model
    from tensorflow.keras.applications.inception_v3 import InceptionV3
    
    local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
    
    pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                    include_top = False, 
                                    weights = None)
    
    pre_trained_model.load_weights(local_weights_file)
    
    for layer in pre_trained_model.layers:
      layer.trainable = False
      
    # pre_trained_model.summary()
    
    last_layer = pre_trained_model.get_layer('mixed7')
    print('last layer output shape: ', last_layer.output_shape)
    last_output = last_layer.output
    
    from tensorflow.keras.optimizers import RMSprop
    
    # Flatten the output layer to 1 dimension
    x = layers.Flatten()(last_output)
    # Add a fully connected layer with 1,024 hidden units and ReLU activation
    x = layers.Dense(1024, activation='relu')(x)
    # Add a dropout rate of 0.2
    x = layers.Dropout(0.2)(x)                  
    # Add a final sigmoid layer for classification
    x = layers.Dense  (1, activation='sigmoid')(x)           
    
    model = Model( pre_trained_model.input, x) 
    
    model.compile(optimizer = RMSprop(lr=0.0001), 
                  loss = 'binary_crossentropy', 
                  metrics = ['accuracy'])
        
    
  • school TF - How to Deal with Overfitting
    • Augmentation

    • Reduce Model Complexity

      • Reduce overfitting by training the network on more examples.
      • Reduce overfitting by changing the complexity of the network (network sturcture and network parameters).
    • Regularization

    • Dropout Layer


  • school TF - Datasets
    TensorFlow Data Services

    TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data.Datasets, enabling easy-to-use and high-performance input pipelines. To get started see the guide and our list of datasets.

  • school Credits

    Credits to Randy @Github, Tensorflow Official Website


  • school Awesome Data Science Resources


Matplotlib CheatSheets

A Quick Guide on Matplotlib

  • receipt What is Matplotlib?

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python

    Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK

    Official Site:https://matplotlib.org/

    If you have any issues with installation, there are other options. Check out the official installation guide.

  • school Matplotlib - Plots Creation

    2. Plots

    Creating plots

    Figure

    Operator Description Documentation
    fig = plt.figures() a container that contains all plot elements link

    Axes

    Operator Description Documentation
    fig.add_axes()
    a = fig.add_subplot(222)
    Initializes subplot
    A subplot is an axes on a grid system
    row-col-num, see examples
    link
    link
    fig, b = plt.subplots(nrows=3, nclos=2) Adds subplot link
    ax = plt.subplots(2, 2) Creates subplot link

    Axes are very useful for subplots. See example here

    After configuring your plot, you must use plt.show() to make it visible

    Plotting

    1D Data

    Operator Description Documentation
    lines = plt.plot(x,y) Plot data connected by lines link
    plt.scatter(x,y) Creates a scatterplot, unconnected data points link
    plt.bar(xvalue, data , width, color...) simple vertical bar chart link
    plt.barh(yvalue, data, width, color...) simple horizontal bar link
    plt.hist(x, y) Plots a histogram link
    plt.boxplot(x,y) Box and Whisker plot
    plt.violinplot(x, y) Creates violin plot link
    ax.fill(x, y, color='lightblue')
    ax.fill_between(x,y,color='yellow')
    Fill area under/between plots link

    For more advanced box plots, start here

    2D Data

    Operator Description Documentation
    fig, ax = plt.subplots()
    im = ax.imshow(img, cmap, vmin...)
    Colormapped or RGB arrays link

    Suggestions?

    Saving plots

    Operator Description Documentation
    plt.savefig('pic.png') Saves plot/figure to image link
    plt.savefig('transparentback.png', transparent=True) Saves transparent plot/figure to image see above
  • school Matplotlib - Examples

    Examples

    Basics

    import matplotlib.pyplot as plt
    
    x = [1, 2.1, 0.4, 8.9, 7.1, 0.1, 3, 5.1, 6.1, 3.4, 2.9, 9]
    y = [1, 3.4, 0.7, 1.3, 9, 0.4, 4, 1.9, 9, 0.3, 4.0, 2.9]
    plt.scatter(x,y, color='red')
    
    w = [0.1, 0.2, 0.4, 0.8, 1.6, 2.1, 2.5, 4, 6.5, 8, 10]
    z = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    plt.plot(z, w, color='lightblue', linewidth=2)
    
    c = [0,1,2,3,4, 5, 6, 7, 8, 9, 10]
    plt.plot(c)
    
    plt.ylabel('some numbers')
    plt.xlabel('some more numbers')
    plt.show()

    alt-text

    import matplotlib.pyplot as plt
    import numpy as np
    
    x = np.random.rand(10)
    y = np.random.rand(10)
    
    plt.plot(x,y,'--', x**2, y**2,'-.')
    plt.savefig('lines.png')
    plt.show()

    alt-text

    import matplotlib.pyplot as plt
    
    
    x = [1, 2, 3, 4]
    y = [1, 4, 9, 6]
    labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']
    
    plt.plot(x, y, 'ro')
    # You can specify a rotation for the tick labels in degrees or with keywords.
    plt.xticks(x, labels, rotation='vertical')
    # Pad margins so that markers don't get clipped by the axes
    plt.margins(0.2)
    plt.savefig('ticks.png')
    plt.show()

    alt-text

  • school Matplotlib - Customizing Plots

    Customization

    Color

    Operator Description Documentation
    plt.plot(x, y, color='lightblue')
    plt.plot(x, y, alpha = 0.4)
    colors plot to color blue link
    plt.colorbar(mappable, orientation='horizontal') mappable: the Image, Contourset etc to which colorbar applies link

    Markers (see examples)

    Operator Description Documentation
    plt.plot(x, y, marker='*') adds * for every data point link
    plt.scatter(x, y, marker='.') adds . for every data point see above

    Lines

    Operator Description Documentation
    plt.plot(x, y, linewidth=2) Sets line width link
    plt.plot(x, y, ls='solid') Sets linestyle, ls can be ommitted, see 2 below see above
    plt.plot(x, y, ls='--') Sets linestyle, ls can be ommitted, see below see above
    plt.plot(x,y,'--', x**2, y**2, '-.') Lines are '--' and '_.', see example see above
    plt.setp(lines,color='red',linewidth=2) Sets properties of plot lines link

    Text

    Operator Description Documentation
    plt.text(1, 1,'Example Text',style='italic') Places text at coordinates 1/1 link
    ax.annotate('some annotation', xy=(10, 10)) Annotate the point with coordinatesxy with text s link
    plt.title(r'$delta_i=20$', fontsize=10) Mathtext link

    Limits, Legends/Labels , Layout

    Limits

    Operator Description Documentation
    plt.xlim(0, 7) Sets x-axis to display 0 - 7 link
    plt.ylim(-0.5, 9) Sets y-axis to display -0.5 - 9 link
    ax.set(xlim=[0, 7], ylim=[-0.5, 9])
    ax.set_xlim(0, 7)
    Sets limits link
    link
    plt.margins(x=1.0, y=1.0) Set margins: add padding to a plot, values 0 - 1
    plt.axis('equal') Set the aspect ratio of the plot to 1

    Legends/Labels

    Operator Description Documentation
    plt.title('just a title') Sets title of plot link
    plt.xlabel('x-axis') Sets label next to x-axis link
    plt.ylabel('y-axis') Sets label next to y-axis link
    ax.set(title='axis', ylabel='Y-Axis', xlabel='X-Axis') Set title and axis labels link
    ax.legend(loc='best') No overlapping plot elements link

    Ticks

    Operator Description Documentation
    plt.xticks(x, labels, rotation='vertical') Set ticks, example link
    ax.xaxis.set(ticks=range(1,5), ticklabels=[3,100,-12,"foo"]) Set x-ticks link
    ax.tick_params(axis='y', direction='inout', length=10) Make y-ticks longer and go in and out link
  • school Matplotlib - SubPlotting

    Subplotting Examples

    import matplotlib.pyplot as plt
    
    x = [0.5, 0.6, 0.8, 1.2, 2.0, 3.0]
    y = [10, 15, 20, 25, 30, 35]
    z = [1, 2, 3, 4]
    w = [10, 20, 30, 40]
    
    fig = plt.figure()
    ax =  fig.add_subplot(111)
    ax.plot(x, y, color='lightblue', linewidth=3)
    ax.scatter([2,3.4,4, 5.5],
                   [5,10,12, 15],
                   color='black',
                   marker='^')
    ax.set_xlim(0, 6.5)
    
    ax2 =  fig.add_subplot(222)
    ax2.plot(z, w, color='lightgreen', linewidth=3)
    ax2.scatter([3,5,7],
                   [5,15,25],
                   color='red',
                   marker='*')
    ax2.set_xlim(1, 7.5)
    
    plt.savefig('mediumplot.png')
    plt.show()

    alt-text

    Thanks to this guy for this good example

    import numpy as np
    import matplotlib.pyplot as plt
    
    # First way #
    
    x = np.random.rand(10)
    y = np.random.rand(10)
    
    figure1 = plt.plot(x,y)
    
    # Second way #
    
    x1 = np.random.rand(10)
    x2 = np.random.rand(10)
    x3 = np.random.rand(10)
    x4 = np.random.rand(10)
    y1 = np.random.rand(10)
    y2 = np.random.rand(10)
    y3 = np.random.rand(10)
    y4 = np.random.rand(10)
    
    figure2, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
    ax1.plot(x1,y1)
    ax2.plot(x2,y2)
    ax3.plot(x3,y3)
    ax4.plot(x4,y4)
    
    plt.show()

    If you haven't used NumPy before, check out my cheat sheet

    alt-text

    import numpy as np
    import matplotlib.pyplot as plt
    
    x = np.linspace(0, 1, 500)
    y = np.sin(4 * np.pi * x) * np.exp(-5 * x)
    
    fig, ax = plt.subplots()
    
    ax.fill(x, y, color='lightblue')
    plt.show()

    alt-text

    source

  • school Matplotlib - Advanced

    Advanced

    Taken from official docs

    import matplotlib.pyplot as plt
            import numpy as np
    
    
            np.random.seed(0)
    
            x, y = np.random.randn(2, 100)
            fig = plt.figure()
            ax1 = fig.add_subplot(211)
            ax1.xcorr(x, y, usevlines=True, maxlags=50, normed=True, lw=2)
            ax1.grid(True)
            ax1.axhline(0, color='black', lw=2)
    
            ax2 = fig.add_subplot(212, sharex=ax1)
            ax2.acorr(x, usevlines=True, normed=True, maxlags=50, lw=2)
            ax2.grid(True)
            ax2.axhline(0, color='black', lw=2)
    
            plt.show()

    alt-text

    Sources: Datacamp, Official Docs and Quandl

  • school Awesome Data Science Resources


insert_chart