I started this week reviewing the Pluralsight course on machine learning and taking extensive notes for reference, which I wasn't able to do in my rushed learning last week. Continuing that machine learning theme, this week's activities had us gain a better understanding Naïve Bayes algorithm. The algorithm is based on Bayes' Theorem in conditional probability, so it became necessary to review it as well. Then, we started exploring TensorFlow and neural networks.
Conditional Probability and Naïve Bayes
One of the most common classification algorithms used in machine learning is the Naïve Bayes. Based on Bayes Theorem in conditional probability, Naïve Bayes assumes each predictor variable is independent of each other. I'd briefly studied statistics in college, and by reviewing my old notes on Bayes' Theorem and conditional probability, I was able to regain some understanding of how they worked.
Conditional probability calculates probability based on events or conditions known to be true, using the equation P(A|B) = ( P(A) \intersect P(B) ) / P(B) where A and B are separate events and A is dependent on B. Bayes Theorem takes it one step further, calculating a second conditional probability from another probability; particularly helpful in generalizing complex tree diagrams by computing the marginal probabilities of each outcome, based on each possible scenario of the first outcome.
![](https://static.wixstatic.com/media/18b19d_26e164f9aac642f39eb5ba29fd7106c3~mv2.png/v1/fill/w_945,h_133,al_c,q_85,enc_auto/18b19d_26e164f9aac642f39eb5ba29fd7106c3~mv2.png)
The Naïve Bayes algorithm uses the same principles of the Bayes Theorem to process small to medium-sized datasets. After calculating the conditional probability of each classifier class, the class of the highest probability is the outcome of the algorithm's prediction.
There are 3 types of Naïve Bayes approaches used in machine learning through Python:
Gaussian - classification model assuming the variables or "features" follow a normal distribution on a graph.
Multinomial - used with discrete counts of variables, can be used to answer the question "how many times has x appeared in n number of trials?"
Bernoulli - the binomial model that deals with binary inputs
Naïve Bayes is one of the simplest and most useful machine learning algorithms. I hope that I can find an example dataset to try and implement this algorithm myself, in addition to learning Random Forest.
TensorFlow Neural Networks
Understanding machine learning contextualized neural networks for me. I found another short and sweet Pluralsight course for basic neural networks in TensorFlow. TensorFlow is an open-source software library used for deep learning, machine learning, and training neural networks. This also drew on some statistical knowledge of conditional probability, calculating loss between the predicted value and actual value and minimizing that loss with an optimizer parameter, to improve the model's performance.
A neural network is simply an arrangement of neurons that uses machine learning to learn or predict based on given data. Generally, each neural network consists of an input layer, hidden layers, and output layers. The data enters the input layer, which may require some normalization or pre-processing before being forward propagated into the hidden layers computing the bulk of the work to find relationships between the inputs, which then gives the information to the output layer making the final adjustments before producing a prediction. Neural networks with more than one hidden layer are considered deep learning models.
![](https://static.wixstatic.com/media/18b19d_edd4e9ee207f44029ed81d14a3c58ab5~mv2.png/v1/fill/w_980,h_550,al_c,q_90,usm_0.66_1.00_0.01,enc_auto/18b19d_edd4e9ee207f44029ed81d14a3c58ab5~mv2.png)
The base equation found in any neuron is y = wx + bias where w is the learned weight assigned to each input x, and bias is how to adjust the output of the equation. This base equation is recognizable as the basic equation for a line; for non-linear models, the resulting summation of wx + bias must be pushed through an activation function, a hyperparameter regulating the extremity of the output.
Once the output is computed, the loss is analyzed to decide whether or not back propagation and weight adjustment is needed. Back propagation takes a lot of computation, which is why hardware accelerators like GPU and TPU are used. As back propagation adjusts the weights from the output layer to the input layer, a new set of inputs is used to train the data and forward propagate to see if the loss has lessened since the last round. This cycle continues until the loss is minimized to the point it can be counted as "trained".
![](https://static.wixstatic.com/media/18b19d_a18a52926a88418d8267039b035c7b0f~mv2.png/v1/fill/w_980,h_543,al_c,q_90,usm_0.66_1.00_0.01,enc_auto/18b19d_a18a52926a88418d8267039b035c7b0f~mv2.png)
Something I found interesting about neural networks was the method of improving the model is different from machine learning. With machine learning, you can only really adjust or switch the algorithms, improve training, or get more data. To fix overfitting, you can adjust the hyperparameter, use cross-validated variants of algorithms, or compromise accuracy in training for accuracy in testing. Meanwhile in neural networks, the overfit of a model can be corrected through reducing the model complexity by removing some hidden layers or reducing the weight of some neurons in the layers, similar to another option of randomly setting some neurons' weights to 0 to simulate having multiple models and reducing the neurons' contributions to the final output. The last option is called "early stop", stopping the testing when validation loss has stopped decreasing or leveled off.
My next goal with neural networks is to try my hand at practicing these techniques on a sample data set. In addition, I am interested in learning more about natural language processing and deep learning, cousins of neural networks.
Comments