Cheatsheets

Python
Numpy
Pandas
Scikit-Learn
Tensorflow
Matplotlib

Python CheatSheets

Quick Guide for Python Engineer

What is Python?

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
Cheatsheets
Data Science Python Tools
Awesome Python Resources

Numpy Cheatsheets

A Quick Learning Guide on Numpy

What is Numpy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.

Source:https://numpy.org/doc/stable/user/whatisnumpy.html

Numpy - Basics

Basics

One of the most commonly used functions of NumPy are NumPy arrays: The essential difference between lists and NumPy arrays is functionality and speed. lists give you basic operation, but NumPy adds FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc.
The most important difference for data science is the ability to do element-wise calculations with NumPy arrays.

axis 0 always refers to row
axis 1 always refers to column

Operator	Description	Documentation
`np.array([1,2,3])`	1d array	link
`np.array([(1,2,3),(4,5,6)])`	2d array	see above
`np.arange(start,stop,step)`	range array	link

Placeholders

Operators	Description	Documentation
`np.linspace(0,2,9)`	Add evenly spaced values btw interval to array of length	link
`np.zeros((1,2))`	Create and array filled with zeros	link
`np.ones((1,2))`	Creates an array filled with ones	link
`np.random.random((5,5))`	Creates random array	link
`np.empty((2,2))`	Creates an empty array	link

Examples

import numpy as np

        # 1 dimensional
        x = np.array([1,2,3])
        # 2 dimensional
        y = np.array([(1,2,3),(4,5,6)])

        x = np.arange(3)
        >>> array([0, 1, 2])

        y = np.arange(3.0)
        >>> array([ 0.,  1.,  2.])

        x = np.arange(3,7)
        >>> array([3, 4, 5, 6])

        y = np.arange(3,7,2)
        >>> array([3, 5])

Numpy - Arrays

Array

Array Properties

Syntax	Description	Documentation
`array.shape`	Dimensions (Rows,Columns)	link
`len(array)`	Length of Array	link
`array.ndim`	Number of Array Dimensions	link
`array.size`	Number of Array Elements	link
`array.dtype`	Data Type	link
`array.astype(type)`	Converts to Data Type	link
`type(array)`	Type of Array	link

Copying/Sorting

Operators	Descriptions	Documentation
`np.copy(array)`	Creates copy of array	link
`other = array.copy()`	Creates deep copy of array	see above
`array.sort()`	Sorts an array	link
`array.sort(axis=0)`	Sorts axis of array	see above

Examples

import numpy as np
        # Sort sorts in ascending order
        y = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
        y.sort()
        print(y)
        >>> [ 1  2  3  4  5  6  7  8  9  10]

Array Manipulation Routines

Adding or Removing Elements

Operator	Description	Documentation
`np.append(a,b)`	Append items to array	link
`np.insert(array, 1, 2, axis)`	Insert items into array at axis 0 or 1	link
`np.resize((2,4))`	Resize array to shape(2,4)	link
`np.delete(array,1,axis)`	Deletes items from array	link

Example

import numpy as np
        # Append items to array
        a = np.array([(1, 2, 3),(4, 5, 6)])
        b = np.append(a, [(7, 8, 9)])
        print(b)
        >>> [1 2 3 4 5 6 7 8 9]

        # Remove index 2 from previous array
        print(np.delete(b, 2))
        >>> [1 2 4 5 6 7 8 9]

Combining Arrays

Operator	Description	Documentation
`np.concatenate((a,b),axis=0)`	Concatenates 2 arrays, adds to end	link
`np.vstack((a,b))`	Stack array row-wise	link
`np.hstack((a,b))`	Stack array column wise	link

Example

import numpy as np
        a = np.array([1, 3, 5])
        b = np.array([2, 4, 6])

        # Stack two arrays row-wise
        print(np.vstack((a,b)))
        >>> [[1 3 5]
             [2 4 6]]

        # Stack two arrays column-wise
        print(np.hstack((a,b)))
        >>> [1 3 5 2 4 6]

Splitting Arrays

Operator	Description	Documentation
`numpy.split()`		link
`np.array_split(array, 3)`	Split an array in sub-arrays of (nearly) identical size	link
`numpy.hsplit(array, 3)`	Split the array horizontally at 3rd index	link

Example

# Split array into groups of ~3
        a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
        print(np.array_split(a, 3))
        >>> [array([1, 2, 3]), array([4, 5, 6]), array([7, 8])]

Shaping Arrays

TODO

Operator	Description	Documentation
`other = ndarray.flatten()`	Flattens a 2d array to 1d	link
numpy.flip()	Flips order of elements in 1D array
np.ndarray[::-1]	Same as above
reshape
squeeze
expand_dims

Numpy - Miscellaneous on Arrays

Misc

Operator	Description	Documentation
`other = ndarray.flatten()`	Flattens a 2d array to 1d	link
`array = np.transpose(other)` `array.T`	Transpose array	link
`inverse = np.linalg.inv(matrix)`	Inverse of a given matrix	link

Example

# Find inverse of a given matrix
        >>> np.linalg.inv([[3,1],[2,4]])
        array([[ 0.4, -0.1],
               [-0.2,  0.3]])

Numpy - Maths Operations

Mathematics

Operations

Operator	Description	Documentation
`np.add(x,y)` `x + y`	Addition	link
`np.substract(x,y)` `x - y`	Subtraction	link
`np.divide(x,y)` `x / y`	Division	link
`np.multiply(x,y)` `x @ y`	Multiplication	link
`np.sqrt(x)`	Square Root	link
`np.sin(x)`	Element-wise sine	link
`np.cos(x)`	Element-wise cosine	link
`np.log(x)`	Element-wise natural log	link
`np.dot(x,y)`	Dot product	link
`np.roots([1,0,-4])`	Roots of a given polynomial coefficients	link

Remember: NumPy array operations work element-wise.

Example

# If a 1d array is added to a 2d array (or the other way), NumPy
       # chooses the array with smaller dimension and adds it to the one
       # with bigger dimension
       a = np.array([1, 2, 3])
       b = np.array([(1, 2, 3), (4, 5, 6)])
       print(np.add(a, b))
       >>> [[2 4 6]
            [5 7 9]]
            
       # Example of np.roots
       # Consider a polynomial function (x-1)^2 = x^2 - 2*x + 1
       # Whose roots are 1,1
       >>> np.roots([1,-2,1])
       array([1., 1.])
       # Similarly x^2 - 4 = 0 has roots as x=±2
       >>> np.roots([1,0,-4])
       array([-2.,  2.])

Comparison

Operator	Description	Documentation
`==`	Equal	link
`!=`	Not equal	link
`<`	Smaller than	link
`>`	Greater than	link
`<=`	Smaller than or equal	link
`>=`	Greater than or equal	link
`np.array_equal(x,y)`	Array-wise comparison	link

Example

# Using comparison operators will create boolean NumPy arrays
       z = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
       c = z < 6
       print(c)
       >>> [ True  True  True  True  True False False False False False]

Numpy - Basic Statistics

Basic Statistics

Operator	Description	Documentation
`np.mean(array)`	Mean	link
`np.median(array)`	Median	link
`array.corrcoef()`	Correlation Coefficient	link
`np.std(array)`	Standard Deviation	link

Example

# Statistics of an array
       a = np.array([1, 1, 2, 5, 8, 10, 11, 12])

       # Standard deviation
       print(np.std(a))
       >>> 4.2938910093294167

       # Median
       print(np.median(a))
       >>> 6.5

Operator	Description	Documentation
`array.sum()`	Array-wise sum	link
`array.min()`	Array-wise minimum value	link
`array.max(axis=0)`	Maximum value of specified axis
`array.cumsum(axis=0)`	Cumulative sum of specified axis	link

Numpy - Slicing and Sorting

Slicing and Subsetting

Operator	Description	Documentation
`array[i]`	1d array at index i	link
`array[i,j]`	2d array at index[i][j]	see above
`array[i<4]`	Boolean Indexing, see Tricks	see above
`array[0:3]`	Select items of index 0, 1 and 2	see above
`array[0:2,1]`	Select items of rows 0 and 1 at column 1	see above
`array[:1]`	Select items of row 0 (equals array[0:1, :])	see above
`array[1:2, :]`	Select items of row 1	see above
[comment]: <> (	`array[1,...]`	equals array[1,:,:]
`array[ : :-1]`	Reverses `array`	see above

Examples

b = np.array([(1, 2, 3), (4, 5, 6)])

       # The index *before* the comma refers to *rows*,
       # the index *after* the comma refers to *columns*
       print(b[0:1, 2])
       >>> [3]

       print(b[:len(b), 2])
       >>> [3 6]

       print(b[0, :])
       >>> [1 2 3]

       print(b[0, 2:])
       >>> [3]

       print(b[:, 0])
       >>> [1 4]

       c = np.array([(1, 2, 3), (4, 5, 6)])
       d = c[1:2, 0:2]
       print(d)
       >>> [[4 5]]

Numpy - Misc

Tricks

# Index trick when working with two np-arrays
       a = np.array([1,2,3,6,1,4,1])
       b = np.array([5,6,7,8,3,1,2])

       # Only saves a at index where b == 1
       other_a = a[b == 1]
       #Saves every spot in a except at index where b != 1
       other_other_a = a[b != 1]

import numpy as np
       x = np.array([4,6,8,1,2,6,9])
       y = x > 5
       print(x[y])
       >>> [6 8 6 9]

       # Even shorter
       x = np.array([1, 2, 3, 4, 4, 35, 212, 5, 5, 6])
       print(x[x < 5])
       >>> [1 2 3 4 4]

Credits

Datacamp, Quandl & Official docs

Pandas CheatSheets

A Quick Guide on Pandas

What is Pandas?

Pandas is the defactor data analysis and munging tool used by Data Scientist and other Data workers for small to medium size datasets

Pandas - Loading Data - CSV

With Pandas you can load or read data contained in a CSV, comma separated file

            
    df = pd.read_csv('file.csv')# often works
    df = pd.read_csv('file.csv',sep=':',na_values = ['na', '-', '.', '']) #

Get data from inline CSV text to a DataFrame

      
from io import StringIO
data = """, Pet, Cuteness, Desirable
row-1, dog, 8.7, True
row-2, cat, 9.5, True
row-3, bat, 2.6, False"""
df = pd.read_csv(StringIO(data),header=0, index_col=0,skipinitialspace=True)

Pandas - Loading Data - Excel

Load DataFrames from a Microsoft Excel file

  
# Each Excel sheet in a Python dictionary
workbook = pd.ExcelFile('file.xlsx')
d = {} # start with an empty dictionary
for sheet_name in workbook.sheet_names:
    df = workbook.parse(sheet_name)
    d[sheet_name] = df

Pandas - Loading Data - Database

In order to read data with pandas from a database you may need a middleware library for that specific database specifically sqlalchemy

Reading From MySQL

    
import pymysql
from sqlalchemy import create_engine

engine = create_engine('mysql+pymysql://'+'USER:PASSWORD@HOST/DATABASE')
df = pd.read_sql_table('table', engine)

Reading From SQLite

    
import sqlite3
import pandas as pd
# Create your connection.
conn = sqlite3.connect('file.db')

df = pd.read_sql_query("SELECT * FROM table_name", conn)

Pandas - Creating DataFrames

How to Create Data in Series then combine into a DataFrame


# Example 1 ...
s1 = Series(range(6))
s2 = s1 * s1
s2.index = s2.index + 2# misalign indexes
df = pd.concat([s1, s2], axis=1)
# Example 2 ...
s3 = Series({'Tom':1, 'Dan':4, 'Har':9})
s4 = Series({'Tom':3, 'Dan':2, 'Mar':5})
df = pd.concat({'A':s3, 'B':s4 }, axis=1)

How to Create a DataFrame from a Python dictionary


# default --- assume data is in columns
df = DataFrame({
'col0' : [1.0, 2.0, 3.0, 4.0],
'col1' : [100, 200, 300, 400]
})

How to Create a DataFrame from data in a Python dictionary


# --- use helper method for data in rows
df = DataFrame.from_dict({ # data by row
# rows as python dictionaries
'row0' : {'col0':0, 'col1':'A'},
'row1' : {'col0':1, 'col1':'B'}
}, orient='index')
df = DataFrame.from_dict({ # data by row
# rows as python lists
'row0' : [1, 1+1j, 'A'],
'row1' : [2, 2+2j, 'B']
}, orient='index')

How to Create fake data (useful for testing)


df = DataFrame(np.random.rand(50,5))

Pandas - Saving DataFrames

Saving a DataFrame to a CSV file


         df.to_csv('name.csv', encoding='utf-8')

Saving DataFrames to an Excel Workbook


         from pandas import ExcelWriter
         writer = ExcelWriter('filename.xlsx')
         df1.to_excel(writer,'Sheet1')
         df2.to_excel(writer,'Sheet2')
         writer.save()

Saving a DataFrame to MySQL


         import pymysql
         from sqlalchemy import create_engine
         e = create_engine('mysql+pymysql://' +
         'USER:PASSWORD@HOST/DATABASE')
         df.to_sql('TABLE',e, if_exists='replace')

Saving to Python objects


         d = df.to_dict() # to dictionary 
         str = df.to_string() # to string 
         m = df.as_matrix() # to numpy matrix

Saving to JSON


         df.to_json('filename.json') # to JSON
         df.to_json('filename.json',orient='records') # to records

Saving to Parquet


         df.to_parquet('filename.parquet.gzip',compression='gzip') # to parquet

Pandas - Basics Preview of Dataset

With Pandas you can get a brief overview of the dataset

             
>>>df.info() # index & data types
# Preview the First N Rows
>>>df.head(n)
# Preview the Last N Rows
>>>df.tail(n)
# Get Descriptive Summary
>>> df.describe() 
# Get DataTypes
>>> df.dtypes
# Get Column Names
>>> df.columns
# Get Dimensions/Shape of DF
>>> df.shape

Pandas - Selecting Columns

With Pandas you can select columns

Select Columns Using Column Names/Labels

             
# Select Single Columns with Specific Name
>>>df['col_name']  # returns Series
>>>df.col_name # returns Series

# Selecting Multiple Columes using Specific Names
>>>feature_cols = ['TV','Radio','Newspaper']
>>>x = df[feature_cols]
# Alternate Method
>>>df[['TV','Radio','Newspaper']]

# Differences
s = df['col_name']  # returns Series
df.col_name # returns Series
df = df[['col_name']] # return DataFrame
df = df[['L1', 'L2']] # select with list
df = df[index] # select with index
df = df[s] #select with Series

# Select Columns Whose Names Matches A Pattern
>>>df.filter(regex='your_pattern')

Note: the difference in return type with the first two examples above based on argument type (scalar vs list).

Select Columns Using Iloc (Index Location) & Conditions


df.iloc[:,:2] # Select the first 2 columns
# by column labels
df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
# conditional
df.filter(like='data')
df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column

Pandas - Selecting Rows

With Pandas you can select rows of interest

Select Rows Using Iloc (Index Location) & Conditions


df.iloc[0] # Select the first row of DataFrame
df.iloc[-1] # Select the last row of DataFrame
df.iloc[1:5] # Select the row 2 to 5 of DataFrame
# by column labels
df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
# conditional
df.filter(like='data')
df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column

Pandas - Selecting Rows & Columns Summary

Use df.loc and df.iloc to select only rows or only columns or both

[row,column]:: First index selects rows and the second selects columns

Select Rows Using Iloc (Index Location) & Conditions


df.iloc[0] # Select the first row of DataFrame
df.iloc[10:30] # Select the row 10 to 30 of DataFrame
df.iloc[:,[2,5,7] # Select all the rows of columns 2,5 and 7

Select Rows & Columns Using Loc


df.loc[:,'colA'] # Select all rows of column colA
df.loc[:,['A','B']]  # syntax is: df.loc[rows_index, cols_index]
df.loc[:,'colA':'colD'] # Select all columns between colA and ColD.
df.loc[:,[2,5,7] # Select all the rows of columns 2,5 and 7


# conditional
df.filter(like='data')
df['preTestScore'].where(df['postTestScore'] > 50) # Find where a value exists in a column

Pandas - Conditionals & Filtering

With Pandas you can get data via conditions

             
# Select rows meeting a condition 
df.loc[df['colA'] > 20]
df.loc[(df['colA'] > 20) & (df['colB'].str.startswith("a"))]
# Using Query
df.query('colA > 20)

Pandas - Apply & ApplyMap

With Pandas you can get apply user defined functions to the dataset

             
# Apply A Fxn: Method 1
df['col'].apply(your_fxn)
# Apply A Fxn: Method 2
df['col'].apply(lambda x: fxn(x))

Pandas - Subset & Subselection

With Pandas you can get a brief overview of the dataset

Sampling DataFrame

             
# Select 50% fraction of dataset
df.sample(frac=0.5)
# Select N number of rows of dataset
df.sample(n=20)

Sampling Largest & Smallest Values

             
# Select nLargest Value of a Column dataset & Order them
df.nlargest(n,'column_name')
df.nlargest(12,'ColA')
# Select nsmallest Value of a Column dataset & Order them
df.nsmallest(n,'column_name')
df.nsmallest(12,'ColA')

Pandas - Working with Missing Values

Missing values are usually represented as np.nan,NaN,null,""

Checking For Missing Values

             
# Check for Missing Values
df.isnull()
df.isna()
# Check For the total number of missing values
df.isnull().sum()
df.isna().sum()   
# Check for Missing Values in A Column
df['colA'].isnull()
df['colA'].isna()

Handling Missing Values

             
# Drop All Missing Values
df.dropna()
# Fill Missing Values with Your 'value'
df.fillna('value')

Pandas -

With Pandas you can get a brief overview of the dataset

ScikitLearn CheatSheets

A Quick Guide on Scikit-Learn

What is Scikit-Learn?

Scikit-Learn (sklearn for short) is a python machine learning framework for performing classification,regression and unsupervised ML projects

It is simple and efficient tool for predictive data analysis built on NumPy, SciPy, and matplotlib

Official Site:https://scikit-learn.org/stable/
Sklearn - Basics
Sklearn follows a common API

It comprises 3 main API for performing every form of ML activity. These include
1. Estimators(Data to Model stage): A function that takes in data and produces predictive model.All the ML Algorithms
2. Transformer(Data to Data Stage):A function that takes in data and augments it to produce useful data
  - Scalers: StandardScaler,MinMaxScaler,Normalizer,etc
  - Vectorizers: CountVectorizer,TfidfVectorizer,etc
  - Tokenizers:
  - Encoders: LabelEncoder,OneHotEncoder,etc
3. Others
Basic Overview of A Task
```
            
# Import Pkg
from sklearn.dummy import DummyClassifier

# Fit the model on the wine dataset and return the model score
dummy_clf = DummyClassifier(strategy="most_frequent", random_state=0)
dummy_clf.fit(X, y)

# Check Accuracy
dummy_clf.score(X, y)     

# Make Prediction
dummy_clf.predict(X)           
            
        
```

Sklearn - Classification Problems

Sample Classifier

            
#Load Data
df = pd.read_csv("data.csv")

# Prepare Data
Xfeatures = df['features']
ylabels = df['labels']

# Split Dataset
X_train, X_test, y_train, y_test = model_selection.train_test_split(Xfeatures, ylabels, random_state=42)

# Import Pkg
from sklearn.submodule import EstimatorClassifier

# Fit the model on the wine dataset and return the model score
clf = EstimatorClassifier(strategy="most_frequent", random_state=0)
clf.fit(X_train, y_train)

# Check Accuracy
clf.score(X_test, y_test)     

# Make Prediction
dummy_clf.predict(X_test)

Sklearn - Classification Metrics

Sklearn - Regression Problems

Sample Regressor

            
#Load Data
df = pd.read_csv("data.csv")

# Prepare Data
Xfeatures = df['features']
ylabels = df['continuous_values']

# Split Dataset
X_train, X_test, y_train, y_test = model_selection.train_test_split(Xfeatures, ylabels, random_state=42)

# Import Pkg
from sklearn.submodule import EstimatorRegressor

# Fit the model on the dataset and return the model score
reg = EstimatorRegressor.Ridge(alpha=.5)
>>> reg.fit(X_train,y_train)
# Metrics
>>> reg.coef_
array([0.34545455, 0.34545455])
>>> reg.intercept_
# Make Prediction
>>reg.predict(X_test)

Sklearn - Regression Metrics

Sklearn - Unsupervised ML - KMeans

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

Sample of KMeans


>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
>>> kmeans.predict([[0, 0], [12, 3]])
array([1, 0], dtype=int32)
>>> kmeans.cluster_centers_
array([[10.,  2.],
       [ 1.,  2.]])

Sklearn - Unsupervised ML - Hierarchical Clustering
Hierarchical clustering is a general family of clustering algorithms that build nested clusters by merging or splitting them successively. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers all the samples, the leaves being the clusters with only one sample

Sample of AgglomerativeClustering

The AgglomerativeClustering object performs a hierarchical clustering using a bottom up approach: each observation starts in its own cluster, and clusters are successively merged together.
```
>>> from sklearn.cluster import AgglomerativeClustering
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [4, 2], [4, 4], [4, 0]])
>>> clustering = AgglomerativeClustering().fit(X)
>>> clustering
AgglomerativeClustering()
>>> clustering.labels_
array([1, 1, 1, 0, 0, 0])
              
          
```
>

Sklearn - Neural Networks (Supervised ML)

You can use Sklearn for Deep Learning Task via the neural_network submodule


>>> X = [[0., 0.], [1., 1.]]
>>> y = [0, 1]
# Init Estimator
>>> mlp_clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
...                     hidden_layer_sizes=(5, 2), random_state=1)
...
# Fit Data
>>> mlp_clf.fit(X, y)
MLPClassifier(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,
              solver='lbfgs')
# Make Prediction
>>>mlp_clf.predict([[2., 2.], [-1., -2.]])

Awesome Data Science Resources
Awesome Data Science Resources
Awesome Data Science Resources

Tensorflow CheatSheets

A Quick Guide on Tensorflow

Tensorflow

TensorFlow is an open-source software library for highperformance numerical computation. Its flexible architecture enables to easily deploy computation across a variety of platforms (CPUs, GPUs, and TPUs), as well as mobile and edge devices, desktops, and clusters of servers. TensorFlow comes with strong support for machine learning and deep learning
Tensorflow - Basics
TensorFlow 2.x Cheat Sheet

Table of Contents
- Layers
- Models
- Activation Functions
- Optimizers
- Loss Functions
- Hyperparameters
- Preprocessing
- Metrics
- Visualizations
- Callbacks
- Transfer Learning
- Overfitting
- TensorFlow Data Services
- Examples

TF - Layers

Layers

Layers	Code	Usage
Dense	`tf.keras.layers.Dense(units, activation, input_shape)`	Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer.
Flatten	`tf.keras.layers.Flatten()`	Flattens the input.
Conv2D	`tf.keras.layers.Conv2D(filters, kernel_size, activation, input_shape)`	Convolution layer for two-dimensional data such as images.
MaxPooling2D	`tf.keras.layers.MaxPool2D(pool_size)`	Max pooling for two-dimensional data.
Dropout	`tf.keras.layers.Dropout(rate)`	The Dropout layer randomly sets input units to 0 with a frequency of `rate` at each step during training time, which helps prevent overfitting.
Embedding	`tf.keras.layers.Embedding(input_dim, output_dim, input_length)`	The Embedding layer is initialized with random weights and will learn an embedding for all of the words in the dataset.
GlobalAveragePooling1D	`tf.keras.layers.GlobalAveragePooling1D()`	Global average pooling operation for temporal data.
Bidirectional LSTM	`tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units, return_sequence))`	Bidirectional Long Short-Term Memory layer
Conv1D	`tf.keras.layers.Conv1D(filters, kernel_size, activation, input_shape)`	Convolution layer for one-dimentional data such as word embeddings.
Bidirectional GRU	`tf.keras.layers.Bidirectional(tf.keras.layers.GRU(units))`	Bidirectional Gated Recurrent Unit
Simple RNN	`tf.keras.layers.SimpleRNN(units, activation, return sequences, input_shape)`	Fully-connected RNN where the output is to be fed back to input.
Lambda	`tf.keras.layers.Lambda(function)`	Wraps arbitrary expressions as a `Layer` object.

TF - Models

Models

Code	Usage
`model = tf.keras.Sequential(layers)`	Sequential groups a linear stack of layers into a tf.keras.Model.
`model.compile(optimizer, loss, metrics)`	Configures the model for training.
`history = model.fit(x, y, epoch)`	Trains the model for a fixed number of epochs (iterations on a dataset).
`history = model.fit_generator(train_generator, steps_per_epoch, epochs, validation_data, validation_steps)`	Fits the model on data yielded batch-by-batch by a Python generator.
`model.evaluate(x, y)`	Returns the loss value & metrics values for the model in test mode.
`model.predict(x)`	Generates output predictions for the input samples.
`model.summary()`	Prints a string summary of the network.
`model.save(path)`	Saves a model as a TensorFlow SavedModel or HDF5 file.
`model.stop_training`	Stops training when true.
`model.save('path/my_model.h5')`	Save a model in HDF5 format.
`new_model = tf.keras.models.load_model('path/my_model.h5')`	Reload a fresh Keras model from the saved model.

TF - Example of Neural Network Model

A Simple Text Classification NN Model

            
# Building the Model
model = Sequential()
model.add(layers.Embedding(input_dim=vocab_size,output_dim=50,input_length=maxlen))
model.add(layers.Flatten())
model.add(layers.Dense(10,activation='relu'))
model.add(layers.Dense(5,activation='softmax'))
# Last layer is the output layer of 5 classes

# Compile the model
model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])

# Check Summary of Model
model.summary()

TF - Activation Functions

Activation Functions

Name Usage

relu the default activation for hidden layers.

sigmoid binary classification.

tanh faster convergence than sigmoid.

softmax multiclass classification.

Name	Usage
relu	the default activation for hidden layers.
sigmoid	binary classification.
tanh	faster convergence than sigmoid.
softmax	multiclass classification.

TF - Optimizers

Optimizers

Name	Usage
Adam	Adam combines the good properties of Adadelta and RMSprop and hence tend to do better for most of the problems.
SGD	Stochastic gradient descent is very basic and works well for shallow networks.
AdaGrad	Adagrad can be useful for sparse data such as tf-idf.
AdaDelta	Extension of AdaGrad which tends to remove the decaying learning Rate problem of it.
RMSprop	Very similar to AdaDelta.

TF - Loss Functions

Loss Functions

Name	Usage
MeanSquaredError	Default loss function for regression problems.
MeanSquaredLogarithmicError	For regression problems with large spread.
MeanAbsoluteError	More robust to outliers.
BinaryCrossEntropy	Default loss function to use for binary classification problems.
Hinge	It is intended for use with binary classification where the target values are in the set {-1, 1}.
SquaredHinge	If using a hinge loss does result in better performance on a given binary classification problem, is likely that a squared hinge loss may be appropriate.
CategoricalCrossEntropy	Default loss function to use for multi-class classification problems.
SparseCategoricalCrossEntropy	Sparse cross-entropy addresses the one hot encoding frustration by performing the same cross-entropy calculation of error, without requiring that the target variable be one hot encoded prior to training.
KLD	KL divergence loss function is more commonly used when using models that learn to approximate a more complex function than simply multi-class classification, such as in the case of an autoencoder used for learning a dense feature representation under a model that must reconstruct the original input.
Huber	Less sensitive to outliers

Awesome Data Science Resources

TF - Hyperparameters

Hyperparameters

Parameter	Tips
Hidden Neurons	The size of the output layer, and 2/3 the size of the input layer, plus the size of the output layer.
Learning Rate	[0.1, 0.01, 0.001, 0.0001]
Momentum	[0.5, 0.9, 0.99]
Batch Size	Small values give a learning process that converges quickly at the cost of noise in the training process. Large values give a learning process that converges slowly with accurate estimates of the error gradient. The typical sizes are [32, 64, 128, 256, 512]
Conv2D Filters	Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc.
Kernel Size	(3, 3)
Pool Size	(2, 2)
Steps per Epoch	sample_size // batch_size
Epoch	Use callbacks
Embedding Dimensions	vocab_size ** 0.25
Truncating	`post`
OOV Token	`<OOV>`

TF - Working with Textual Data

Tokenizer, Text-to-sequence & Padding


         import tensorflow as tf
         from tensorflow import keras


         from tensorflow.keras.preprocessing.text import Tokenizer
         from tensorflow.keras.preprocessing.sequence import pad_sequences

         sentences = [
             'I love my dog',
             'I love my cat',
             'You love my dog!',
             'Do you think my dog is amazing?'
         ]

         tokenizer = Tokenizer(num_words = 100, oov_token="<OOV>")

         # Key value pair (word: token)
         tokenizer.fit_on_texts(sentences)
         word_index = tokenizer.word_index

         # Lists of tokenized sentences
         sequences = tokenizer.texts_to_sequences(sentences)

         # Padded tokenized sentences
         padded = pad_sequences(sequences, maxlen=5)

         print("\nWord Index = " , word_index)
         print("\nSequences = " , sequences)
         print("\nPadded Sequences:")
         print(padded)

One-hot Encoding

ys = tf.keras.utils.to_categorical(labels, num_classes=3)

TF - Working with Image Data

Preprocessing

ImageDataGenerator


from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Image augmentation
train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        '/tmp/horse-or-human/',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 300x300
        batch_size=128,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

# Flow training images in batches of 128 using train_datagen generator
validation_generator = validation_datagen.flow_from_directory(
        '/tmp/validation-horse-or-human/',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 300x300
        batch_size=32,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')">

TF - Visualizing Training Loss

Get the Structure of a Model

             
# Plot Model
from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True)

              
history = model.fit(x_train_seq, y_train,epochs=5,verbose=False,validation_data=(x_test_seq, y_test),batch_size=10)
history.history

# Function to Plot
def plot_history(history):
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    x = range(1, len(acc) + 1)

    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(x, acc, 'b', label='Training acc')
    plt.plot(x, val_acc, 'r', label='Validation acc')
    plt.title('Training and validation accuracy')
    plt.legend()
    plt.subplot(1, 2, 2)
    plt.plot(x, loss, 'b', label='Training loss')
    plt.plot(x, val_loss, 'r', label='Validation loss')
    plt.title('Training and validation loss')
    plt.legend()

TF - CallBacks

             
                 Learning Rate Scheduler

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"), 
    tf.keras.layers.Dense(10, activation="relu"), 
    tf.keras.layers.Dense(1)
])

lr_schedule = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: 1e-8 * 10**(epoch / 20))
optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(dataset, epochs=100, callbacks=[lr_schedule], verbose=0)
End of Training Cycles

import tensorflow as tf

class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    if(logs.get('accuracy')>0.6):
      print("\nReached 60% accuracy so cancelling training!")
      self.model.stop_training = True

mnist = tf.keras.datasets.fashion_mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

callbacks = myCallback()

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer=tf.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])

TF - Transfer Learning

    
import os

from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()

last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output

from tensorflow.keras.optimizers import RMSprop

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

TF - How to Deal with Overfitting
- Augmentation
- Reduce Model Complexity
  - Reduce overfitting by training the network on more examples.
  - Reduce overfitting by changing the complexity of the network (network sturcture and network parameters).
- Regularization
- Dropout Layer
TF - Datasets

TensorFlow Data Services

TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data.Datasets, enabling easy-to-use and high-performance input pipelines. To get started see the guide and our list of datasets.
Credits

Credits to Randy @Github, Tensorflow Official Website
Awesome Data Science Resources

Matplotlib CheatSheets

A Quick Guide on Matplotlib

What is Matplotlib?

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK

Official Site:https://matplotlib.org/

If you have any issues with installation, there are other options. Check out the official installation guide.

Matplotlib - Plots Creation

2. Plots

Creating plots

Figure

Operator	Description	Documentation
`fig = plt.figures()`	a container that contains all plot elements	link

Axes

Operator	Description	Documentation
`fig.add_axes()` `a = fig.add_subplot(222)`	Initializes subplot A subplot is an axes on a grid system row-col-num, see examples	link link
`fig, b = plt.subplots(nrows=3, nclos=2)`	Adds subplot	link
`ax = plt.subplots(2, 2)`	Creates subplot	link

Axes are very useful for subplots. See example here

After configuring your plot, you must use plt.show() to make it visible

Plotting

1D Data

Operator	Description	Documentation
`lines = plt.plot(x,y)`	Plot data connected by lines	link
`plt.scatter(x,y)`	Creates a scatterplot, unconnected data points	link
`plt.bar(xvalue, data , width, color...)`	simple vertical bar chart	link
`plt.barh(yvalue, data, width, color...)`	simple horizontal bar	link
`plt.hist(x, y)`	Plots a histogram	link
`plt.boxplot(x,y)`	Box and Whisker plot
`plt.violinplot(x, y)`	Creates violin plot	link
`ax.fill(x, y, color='lightblue')` `ax.fill_between(x,y,color='yellow')`	Fill area under/between plots	link

For more advanced box plots, start here

2D Data

Operator	Description	Documentation
`fig, ax = plt.subplots()` `im = ax.imshow(img, cmap, vmin...)`	Colormapped or RGB arrays	link

Suggestions?

Saving plots

Operator	Description	Documentation
`plt.savefig('pic.png')`	Saves plot/figure to image	link
`plt.savefig('transparentback.png', transparent=True)`	Saves transparent plot/figure to image	see above

Matplotlib - Examples

Examples

Basics

import matplotlib.pyplot as plt

x = [1, 2.1, 0.4, 8.9, 7.1, 0.1, 3, 5.1, 6.1, 3.4, 2.9, 9]
y = [1, 3.4, 0.7, 1.3, 9, 0.4, 4, 1.9, 9, 0.3, 4.0, 2.9]
plt.scatter(x,y, color='red')

w = [0.1, 0.2, 0.4, 0.8, 1.6, 2.1, 2.5, 4, 6.5, 8, 10]
z = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
plt.plot(z, w, color='lightblue', linewidth=2)

c = [0,1,2,3,4, 5, 6, 7, 8, 9, 10]
plt.plot(c)

plt.ylabel('some numbers')
plt.xlabel('some more numbers')
plt.show()

import matplotlib.pyplot as plt
import numpy as np

x = np.random.rand(10)
y = np.random.rand(10)

plt.plot(x,y,'--', x**2, y**2,'-.')
plt.savefig('lines.png')
plt.show()

import matplotlib.pyplot as plt


x = [1, 2, 3, 4]
y = [1, 4, 9, 6]
labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']

plt.plot(x, y, 'ro')
# You can specify a rotation for the tick labels in degrees or with keywords.
plt.xticks(x, labels, rotation='vertical')
# Pad margins so that markers don't get clipped by the axes
plt.margins(0.2)
plt.savefig('ticks.png')
plt.show()

Matplotlib - Customizing Plots

Customization

Color

Operator	Description	Documentation
`plt.plot(x, y, color='lightblue')` `plt.plot(x, y, alpha = 0.4)`	colors plot to color blue	link
`plt.colorbar(mappable, orientation='horizontal')`	`mappable`: the Image, Contourset etc to which colorbar applies	link

Markers (see examples)

Operator	Description	Documentation
`plt.plot(x, y, marker='*')`	adds `*` for every data point	link
`plt.scatter(x, y, marker='.')`	adds . for every data point	see above

Lines

Operator	Description	Documentation
`plt.plot(x, y, linewidth=2)`	Sets line width	link
`plt.plot(x, y, ls='solid')`	Sets linestyle, `ls` can be ommitted, see 2 below	see above
`plt.plot(x, y, ls='--')`	Sets linestyle, `ls` can be ommitted, see below	see above
`plt.plot(x,y,'--', x2, y2, '-.')`	Lines are '--' and '_.', see example	see above
`plt.setp(lines,color='red',linewidth=2)`	Sets properties of plot `lines`	link

Text

Operator	Description	Documentation
`plt.text(1, 1,'Example Text',style='italic')`	Places text at coordinates 1/1	link
`ax.annotate('some annotation', xy=(10, 10))`	Annotate the point with coordinates`xy` with text `s`	link
`plt.title(r'$delta_i=20$', fontsize=10)`	Mathtext	link

Limits, Legends/Labels , Layout

Limits

Operator	Description	Documentation
`plt.xlim(0, 7)`	Sets x-axis to display 0 - 7	link
`plt.ylim(-0.5, 9)`	Sets y-axis to display -0.5 - 9	link
`ax.set(xlim=[0, 7], ylim=[-0.5, 9])` `ax.set_xlim(0, 7)`	Sets limits	link link
`plt.margins(x=1.0, y=1.0)`	Set margins: add padding to a plot, values 0 - 1
`plt.axis('equal')`	Set the aspect ratio of the plot to 1

Legends/Labels

Operator	Description	Documentation
`plt.title('just a title')`	Sets title of plot	link
`plt.xlabel('x-axis')`	Sets label next to x-axis	link
`plt.ylabel('y-axis')`	Sets label next to y-axis	link
`ax.set(title='axis', ylabel='Y-Axis', xlabel='X-Axis')`	Set title and axis labels	link
`ax.legend(loc='best')`	No overlapping plot elements	link

Ticks

Operator	Description	Documentation
`plt.xticks(x, labels, rotation='vertical')`	Set ticks, example	link
`ax.xaxis.set(ticks=range(1,5), ticklabels=[3,100,-12,"foo"])`	Set x-ticks	link
`ax.tick_params(axis='y', direction='inout', length=10)`	Make y-ticks longer and go in and out	link

Matplotlib - SubPlotting

Subplotting Examples

import matplotlib.pyplot as plt

x = [0.5, 0.6, 0.8, 1.2, 2.0, 3.0]
y = [10, 15, 20, 25, 30, 35]
z = [1, 2, 3, 4]
w = [10, 20, 30, 40]

fig = plt.figure()
ax =  fig.add_subplot(111)
ax.plot(x, y, color='lightblue', linewidth=3)
ax.scatter([2,3.4,4, 5.5],
               [5,10,12, 15],
               color='black',
               marker='^')
ax.set_xlim(0, 6.5)

ax2 =  fig.add_subplot(222)
ax2.plot(z, w, color='lightgreen', linewidth=3)
ax2.scatter([3,5,7],
               [5,15,25],
               color='red',
               marker='*')
ax2.set_xlim(1, 7.5)

plt.savefig('mediumplot.png')
plt.show()

Thanks to this guy for this good example

import numpy as np
import matplotlib.pyplot as plt

# First way #

x = np.random.rand(10)
y = np.random.rand(10)

figure1 = plt.plot(x,y)

# Second way #

x1 = np.random.rand(10)
x2 = np.random.rand(10)
x3 = np.random.rand(10)
x4 = np.random.rand(10)
y1 = np.random.rand(10)
y2 = np.random.rand(10)
y3 = np.random.rand(10)
y4 = np.random.rand(10)

figure2, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
ax1.plot(x1,y1)
ax2.plot(x2,y2)
ax3.plot(x3,y3)
ax4.plot(x4,y4)

plt.show()

If you haven't used NumPy before, check out my cheat sheet

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 1, 500)
y = np.sin(4 * np.pi * x) * np.exp(-5 * x)

fig, ax = plt.subplots()

ax.fill(x, y, color='lightblue')
plt.show()

source

Matplotlib - Advanced

Advanced

Taken from official docs

import matplotlib.pyplot as plt
        import numpy as np


        np.random.seed(0)

        x, y = np.random.randn(2, 100)
        fig = plt.figure()
        ax1 = fig.add_subplot(211)
        ax1.xcorr(x, y, usevlines=True, maxlags=50, normed=True, lw=2)
        ax1.grid(True)
        ax1.axhline(0, color='black', lw=2)

        ax2 = fig.add_subplot(212, sharex=ax1)
        ax2.acorr(x, usevlines=True, normed=True, maxlags=50, lw=2)
        ax2.grid(True)
        ax2.axhline(0, color='black', lw=2)

        plt.show()

Sources: Datacamp, Official Docs and Quandl

Awesome Data Science Resources

Python CheatSheets

Numpy Cheatsheets

Basics

Placeholders

Examples

Array

Array Properties

Copying/Sorting

Examples

Array Manipulation Routines

Adding or Removing Elements

Example

Combining Arrays

Example

Splitting Arrays

Example

Shaping Arrays

TODO

Misc

Example

Mathematics

Operations

Example

Comparison

Example

Basic Statistics

Example

More

Slicing and Subsetting

Examples

Tricks

Pandas CheatSheets

Get data from inline CSV text to a DataFrame

Reading From MySQL

Reading From SQLite

How to Create Data in Series then combine into a DataFrame

How to Create a DataFrame from a Python dictionary

How to Create a DataFrame from data in a Python dictionary

How to Create fake data (useful for testing)

Saving a DataFrame to a CSV file

Saving DataFrames to an Excel Workbook

Saving a DataFrame to MySQL

Saving to Python objects

Saving to JSON

Saving to Parquet

Select Columns Using Column Names/Labels

Select Columns Using Iloc (Index Location) & Conditions

Select Rows Using Iloc (Index Location) & Conditions

Select Rows Using Iloc (Index Location) & Conditions

Select Rows & Columns Using Loc

Sampling DataFrame

Sampling Largest & Smallest Values

Checking For Missing Values

Handling Missing Values

ScikitLearn CheatSheets

Types of Transformers

Basic Overview of A Task

Sample Classifier

Sample Regressor

Sample of KMeans

Sample of AgglomerativeClustering

Tensorflow CheatSheets

TensorFlow 2.x Cheat Sheet

Table of Contents

Layers

Models

Activation Functions

Optimizers

Loss Functions

Hyperparameters

Preprocessing

TensorFlow Data Services

Matplotlib CheatSheets

2. Plots

Creating plots

Plotting

Examples

Basics

Customization

Subplotting Examples