Getting Started with JanusGraph and Python

Ethan Phan

2019-10-10 11:22

Introduction¶

janusgraph

The following introduction is from the official janusgrah website

JanusGraph can be queried from all languages for which a TinkerPop driver exists. Drivers allow sending of Gremlin traversals to a Gremlin Server like the JanusGraph Server. A list of TinkerPop drivers is available on TinkerPop’s homepage.

In addition to drivers, there exist query languages for TinkerPop that make it easier to use Gremlin in different programming languages like Java, Python, or C#. Some of these languages even construct Gremlin traversals from completely different query languages like Cypher or SPARQL. Since JanusGraph implements TinkerPop, all of these languages can be used together with JanusGraph.

The effect of data shuffling in mini-batch training

Ethan Phan

2019-07-25 18:22

Often when we train a neural network with mini batches we shuffle the training set before every epoch. It is a very good practice but why? Do we need to do this? I'll try to answer these question in this blog post.

data shuffling

Linear Discriminant Analysis

Ethan Phan

2019-07-08 22:05

Linear Discriminant Analysis(LDA) is a very common technique used for supervised classification problems. Let's dig in to understand what is it, how it works, how we should use it.

My note on Ordinary Least Squares

Ethan Phan

2019-06-27 22:31

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being predicted) in the given dataset and those predicted by the linear function.

The True Model

Suppose the data consists of N observations $\{x_i, y_i\}_{i=1}^{N}$ . Each observation i includes a scalar response $y_i$ and a column vector $x_i$ of values of K predictors (regressors) $x_{ij}$ for j = 1, ..., K. In a linear regression model, the response variable, $y_i$ is a linear function of the regressors:

$$ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_P x_{iK} + \epsilon_i, $$

or in vector form,

$$ y_i = x_i^T \beta + \epsilon_i, $$

where $\beta$ is a K×1 vector of unknown parameters; the $\epsilon_i$'s are unobserved scalar random variables (errors) which account for influences upon the responses $y_i$ from sources other than the explanators $x_i$; and $x_{i}$ is a column vector of the ith observations of all the explanatory variables. This model can also be written in matrix notation as

$$ y = X \beta + \epsilon \qquad (1)$$

Maximum likelihood estimators and least squares

Ethan Phan

2019-06-14 23:30

Why do we choose to minimize Mean Square Error (least square) when we do linear regression? Is it because it is smooth and easy to solve its direvative? Is it because that's out intuition about how to fit a curve to a set of points? As it turn out, there is a mathematical reason behind this and It has something to do with Maximum Likelihood

Maximum likelihood estimators (MLE)

A maximum likelihood estimate for some hidden parameter $\lambda$ (or parameters, plural) of some probability distribution is a number $\hat{\lambda}$ computed from an independent identical distribution (i.i.d.) sample $X_{1} , ..., X_{n}$ from the given distribution that maximizes something called the “likelihood function”. Suppose that the distribution in question is governed by a pdf $f(x; \lambda_{1}, ..., \lambda_{k})$, where the $\lambda_{i}$’s are all hidden parameters. The likelihood function associated to the sample is just

$$ L(X_{1}, X_{2}, ..., X_{n}) = \prod_{1}^{n}f(X_{i}; \lambda_{1}, ..., \lambda_{k}) $$

MSE and Bias-Variance decomposition

Ethan Phan

2019-06-12 22:48

I was reading the book "The Elements of Statistical Learning The Elements of Statistical Learning" to the part about MSE (mean square error) and bias–variance decomposition and it's confusing to me. Understand this is very important to be able to have a good grasp of underfitting, overfitting. Unfortunately, The book didn't explain it clearly (or I was just too stupid for the book). So, I sought the explain on the internet and I found one. Here I will write it down for future reference. There are two common contexts: MSE for estimator and MSE for predictor.

Wait, WTF is an estimator and a predictor?

"Prediction" and "estimation" indeed are sometimes used interchangeably in non-technical writing and they seem to function similarly, but there is a sharp distinction between them in the standard model of a statistical problem. An estimator uses data to guess at a parameter while a predictor uses the data to guess at some random value that is not part of the dataset.

MSE for estimator¶

Estimator is any function on a sample of the data that usually tries to estimate some useful qualities of the original data from which the sample is drawn. Formally, estimator is a function on a sample S: $$ \hat{\theta}_{S}=g(S), S=(x_{1}, x_{2},..., x_{m}) $$ where $x_{i}$ is a random variable drawn from a unknown distribution $D$. i.e. $x_{i} \sim D$

Bayes Boundary with Multivariate Mixture Gaussian Distributions

Ethan Phan

2019-06-10 09:17

Multivariate Mixture Gaussian Model¶

Problem statement:¶

Create a data set with N = 500 points from two mixed Gaussian distributions (each distribution has five bivariate Gaussian distributions). The elements of the first mixed distribution have a maximum average value of 0 and a minimum average of -5 and a variance of 1. The elements of the second mixed distribution have a maximum mean value of 5, the minimum average is 0 and the variance is 1. Draw decision boundary (Bayes boundary) between N points of the first mixture distribution and N points of the second mixture distribution without using any machine learning models.

In [1]:

# import things we might need
import numpy as np
import numpy.random
import matplotlib.pyplot as plt

Generate samples¶

We assume the distribution of mean of the 5 gausian distributions is uniform
For simplicity, we assume each variable of a bivariate distribution is independent of each other => covariance matrix is diagonal matrix [[1, 0], [0, 1]]
Read more…