Neural Networks:- What? Why? How?

9 min readSep 27, 2021

History of neural networks

The history of neural networks is longer than most people think. While the idea of “a machine that thinks” can be traced to the Ancient Greeks, we’ll focus on the key events that led to the evolution of thinking around neural networks, which has ebbed and flowed in popularity over the years:

1943: Warren S. McCulloch and Walter Pitts published “A logical calculus of the ideas immanent in nervous activity (PDF, 1 MB) (link resides outside IBM)” This research sought to understand how the human brain could produce complex patterns through connected brain cells, or neurons. One of the main ideas that came out of this work was the comparison of neurons with a binary threshold to Boolean logic (i.e., 0/1 or true/false statements).

1958: Frank Rosenblatt is credited with the development of the perceptron, documented in his research, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain” (PDF, 1.6 MB) (link resides outside IBM). He takes McCulloch and Pitt’s work a step further by introducing weights to the equation. Leveraging an IBM 704, Rosenblatt was able to get a computer to learn how to distinguish cards marked on the left vs. cards marked on the right.

1974: While numerous researchers contributed to the idea of backpropagation, Paul Werbos was the first person in the US to note its application within neural networks within his PhD thesis (PDF, 8.1 MB) (link resides outside IBM).

1989: Yann LeCun published a paper (PDF, 5.7 MB) (link resides outside IBM) illustrating how the use of constraints in backpropagation and its integration into the neural network architecture can be used to train algorithms. This research successfully leveraged a neural network to recognize hand-written zip code digits provided by the U.S. Postal Service.

What are neural networks?

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.

Artificial neural networks (ANNs) are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

How do neural networks work?

Think of each individual node as its own linear regression model, composed of input data, weights, a bias (or threshold), and an output. The formula would look something like this:

∑wixi + bias = w1x1 + w2x2 + w3x3 + bias

output = f(x) = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b < 0

Once an input layer is determined, weights are assigned. These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs. All inputs are then multiplied by their respective weights and then summed. Afterward, the output is passed through an activation function, which determines the output. If that output exceeds a given threshold, it “fires” (or activates) the node, passing data to the next layer in the network. This results in the output of one node becoming in the input of the next node. This process of passing data from one layer to the next layer defines this neural network as a feedforward network.

Let’s break down what one single node might look like using binary values. We can apply this concept to a more tangible example, like whether you should go surfing (Yes: 1, No: 0). The decision to go or not to go is our predicted outcome, or y-hat. Let’s assume that there are three factors influencing your decision-making:

Are the waves good? (Yes: 1, No: 0)
Is the line-up empty? (Yes: 1, No: 0)
Has there been a recent shark attack? (Yes: 0, No: 1)

Then, let’s assume the following, giving us the following inputs:

X1 = 1, since the waves are pumping
X2 = 0, since the crowds are out
X3 = 1, since there hasn’t been a recent shark attack

Now, we need to assign some weights to determine importance. Larger weights signify that particular variables are of greater importance to the decision or outcome.

W1 = 5, since large swells don’t come around often
W2 = 2, since you’re used to the crowds
W3 = 4, since you have a fear of sharks

Finally, we’ll also assume a threshold value of 3, which would translate to a bias value of –3. With all the various inputs, we can start to plug in values into the formula to get the desired output.

Y-hat = (1*5) + (0*2) + (1*4) — 3 = 6

If we use the activation function from the beginning of this section, we can determine that the output of this node would be 1, since 6 is greater than 0. In this instance, you would go surfing; but if we adjust the weights or the threshold, we can achieve different outcomes from the model. When we observe one decision, like in the above example, we can see how a neural network could make increasingly complex decisions depending on the output of previous decisions or layers.

In the example above, we used perceptrons to illustrate some of the mathematics at play here, but neural networks leverage sigmoid neurons, which are distinguished by having values between 0 and 1. Since neural networks behave similarly to decision trees, cascading data from one node to another, having x values between 0 and 1 will reduce the impact of any given change of a single variable on the output of any given node, and subsequently, the output of the neural network.

As we start to think about more practical use cases for neural networks, like image recognition or classification, we’ll leverage supervised learning, or labeled datasets, to train the algorithm. As we train the model, we’ll want to evaluate its accuracy using a cost (or loss) function. This is also commonly referred to as the mean squared error (MSE). In the equation below,

i represents the index of the sample,
y-hat is the predicted outcome,
y is the actual value, and
m is the number of samples.

𝐶𝑜𝑠𝑡 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛= 𝑀𝑆𝐸=1/2𝑚 ∑129_(𝑖=1)^𝑚▒(𝑦 ̂^((𝑖) )−𝑦^((𝑖) ) )²

Ultimately, the goal is to minimize our cost function to ensure correctness of fit for any given observation. As the model adjusts its weights and bias, it uses the cost function and reinforcement learning to reach the point of convergence, or the local minimum. The process in which the algorithm adjusts its weights is through gradient descent, allowing the model to determine the direction to take to reduce errors (or minimize the cost function). With each training example, the parameters of the model adjust to gradually converge at the minimum.

WHY?

Neural Networks in the Retail Sector

As we have noted, Artificial Neural Networks are versatile systems, capable of dealing reliably with a number of different factors.

This ability to handle a number of variables makes Artificial Neural Networks an ideal choice for the retail sector.

For instance, Artificial Neural Networks are, when given the right information, able to make accurate forecasts.

These forecasts are often more accurate than those made in the traditional manner, by analysing statistics.

This can allow accurate sales forecasts to be generated.

In turn, this information allows your businesses to purchase the right amount of stock.

This reduces the chances of selling out of certain items.

It also reduces the risk of valuable warehouse space being taken up by products you are unable to sell.

Online grocers Ocado are making the most of this technology.

Their smart warehouses rely on robots to do everything from stock management to fulfilling customer orders.

This information is used to power the trend of dynamic pricing.

Many companies, such as Amazon, use dynamic pricing to reproduced and increase revenue.

This application has spread beyond retail, service providers, such as Uber, even use this information to adjust prices depending on the customer.

Many retail organisations, such as Walmart, use Artificial Neural Networks to predict future product demand.

The network models analyse location, historical data sets, as well as weather forecasts, models and other pieces of relevant information.

This is used to predict an increase in sales of umbrellas or snow clearing products.

By predicting a potential rise in demand the company is able to increase stock in store.

This means that customers won’t leave empty-handed and also allows Walmart to offer product-related offers and incentives.

Applications to Encourage Repeat Custom

As well as monitoring and suggesting purchases, Artificial Neural Network systems also allow you to analyse the time between purchases.

This application is most useful when monitoring individual customer habits.

For example, a customer may buy new ink cartridges every 2 months.

Systems powered by Artificial Neural Networks can identify and monitor this repeat custom.

You can then contact your customer and remind them to buy when the time to purchase the product approaches.

This friendly reminder increases the chances of the customer returning to your store to make their purchase.

Retailers that offer loyalty schemes are already taking advantage of this.

Beauty brand Sephora’s Beauty Insider program records every purchase a customer makes.

It also records how frequently these purchases are made.

This information allows the company to predict when a customer’s products may be running low.

At this point the company sends a “restock your stash” email, prompting the customer to make a repeat purchase.

This information can also be used to develop a personalised marketing approach offering incentives or discounts.

Keeping Customers Loyal to Your Company

Artificial Neural Networks can also identify customers likely to switch to a competitor.

By knowing which customers are most likely to defect you are able to target them with tailored marketing campaigns.

Offering incentives, or friendly reminders about your company, will encourage customers to stick around.

This predictive use of Artificial Neural Networks is already benefiting FedEx.

Forbes reports that FedEx can predict which customers are likely to leave with an accuracy of 60–90%.

By applying Artificial Neural Networks in this way we can enhance and personalise the consumer’s experience.

Encouraging repeat custom and helping to build a relationship between your business and your customers.

Artificial Neural Networks in Financial Services

When it comes to AI banking and finance, Artificial Neural Networks are well suited to forecasting.

This suitability largely comes from their ability to quickly and accurately analyse large amounts of data.

Artificial Neural Networks are capable of processing and interpreting both structured and unstructured data.

After processing this information Artificial Neural Networks are also able to make accurate predictions.

The more information we can give the system, the more accurate the prediction will be.