probabilitystatisticsAI

Navigating the Data Universe: Understanding Probability for AI Developers - Part 2

By Ayoola Fakoya
Picture of the author
Published on
Generative AI concept art

Introduction

Probability is a fundamental tool that empowers AI developers to quantify uncertainty and predict the likelihood of events. In this journey into the realm of probability, we'll explore essential concepts crucial for any developer navigating the unpredictable landscape of data. Our initial focus delves into the foundational concepts of probability, providing a solid base for more advanced calculations.

Unraveling Probability Fundamentals

As AI developers, we often grapple with uncertainty, and probability becomes our compass. Let's start with the bedrock concepts:

1. Random Experiment

Probability thrives on what statisticians term as "random experiments" or statistical experiments. These are processes with outcomes that elude precise prediction. Picture tossing a coin or rolling a die—beforehand, the results are anyone's guess. The common threads among random experiments are multiple possible outcomes, pre-representable possibilities, and outcomes dictated by chance.

2. Outcome

In the statistical realm, the result of a random experiment is an "outcome." Rolling a die, for instance, has six possible outcomes—1, 2, 3, 4, 5, or 6.

3. Event

Events, in the context of probability, are sets of outcomes. Rolling an even number on a die constitutes an event with outcomes 2, 4, and 6.

Quantifying Uncertainty: Probability Calculation

Now, let's embark on understanding how we assign probabilities to events:

Probability of an Event

Expressed as a number between 0 and 1, the probability of an event signifies the likelihood of its occurrence. If P(event) = 0, there's a 0% chance; if P(event) = 1, there's a 100% chance. Degrees between 0 and 1 provide a spectrum of probabilities, aiding decision-making in uncertain scenarios.

Calculating Probability

When outcomes are equally likely, calculating the probability of an event involves dividing the number of desired outcomes by the total possible outcomes. For instance, the probability of getting heads in a fair coin toss is 0.5, or 50%.

Putting Theory into Practice:

Let's apply these concepts to classic examples—the coin toss and die roll:

Example: Coin Toss

Tossing a fair coin exemplifies a random experiment. The probability of getting heads is 0.5, reflecting the even chance of heads or tails in a fair scenario.

Example: Die Roll

Rolling a six-sided die introduces six possible outcomes. The probability of rolling a 3 is 1/6 or approximately 16.7%.

Navigating Probability Notation

To navigate the educational and technical landscape, understanding probability notation is crucial:

Probability Notation

In notation, P denotes the probability of an event, with A and B representing individual events. P(A) signifies the probability of event A, always within the bounds of 0 to 1.

Understanding Bayes's Theorem: A Practical Guide

In Bayesian statistics, two crucial terms come into play: prior probability and posterior probability. The prior probability represents the likelihood of an event before new data surfaces, while the posterior probability reflects the updated probability after incorporating new information.

Bayes’s Theorem Formula

The core of Bayes's theorem is its formula:

P(A|B) = (P(B|A) * P(A)) / P(B).

Breaking it down:

  • P(A|B): Posterior probability (probability of A given B)
  • P(A): Prior probability (probability of A)
  • P(B|A): Likelihood (probability of B given A)
  • P(B): Evidence (probability of B)

Practical Application: Spam Filter Example

To illustrate the practical utility of Bayes's theorem, let's delve into a real-world scenario: spam filtering.

Scenario Setup

Imagine you want to determine the probability that an email is spam given the presence of a specific word—in this case, "money."

Data Collection

Assusming:

  • The probability of an email being spam (( P(Spam) )) is 20%.
  • The probability that the word "money" appears in an email (( P(Money) )) is 15%.
  • The probability that the word "money" appears in a spam email (( P(Money|Spam) )) is 40%.

Applying Bayes’s Theorem

Now, let's utilize Bayes's theorem to calculate the posterior probability (P(Spam|Money)):

$P(Spam|Money) = P(Money|Spam)*P(Spam)/P(Money)$

​ Substitute in the given data:

$P(Spam|Money) = (0.4 * 0.2) / 0.15 = 0.53333$

Therefore, the probability that an email is spam given that it contains the word "money" is approximately 53.3%.

Understanding Discrete Probability Distributions

Your journey into probability continues with a critical exploration of discrete probability distributions. These distributions offer a powerful lens through which you can model diverse datasets, uncover patterns, and make informed decisions. We'll delve into four common discrete probability distributions: Uniform, Binomial, Bernoulli, and Poisson.

Uniform Distribution

The Uniform Distribution is a straightforward model where all possible outcomes of an event are equally likely. Take, for instance, rolling a fair six-sided die—the probability of each outcome (1, 2, 3, 4, 5, or 6) is the same, around 16.7%. Visualized with a histogram, the uniform distribution showcases equal probabilities for each possible outcome.

Note: The uniform distribution holds for both discrete and continuous random variables.

Unifrom distribution *Source: [Investopedia]
Unifrom distribution *Source: [Investopedia]

Binomial Distribution

The Binomial Distribution steps into scenarios where events have only two possible outcomes: success or failure. This distribution assumes that each event is independent, having the same probability of success. An example of a binomial experiment is tossing a coin multiple times. The histogram for a binomial distribution often exhibits a shape resembling a normal distribution.

Applications of the binomial distribution span areas such as modeling the probability of fraud in credit card transactions or predicting stock price movements.

Binomial Distribution *Source: [tonmatincwordde.exblog.jp]
Binomial Distribution *Source: [tonmatincwordde.exblog.jp]

Bernoulli Distribution

Similar to the Binomial Distribution, the Bernoulli Distribution models events with two possible outcomes, but in this case, it pertains to a single trial of an experiment. A quintessential example is a single coin toss, where success and failure are the two potential outcomes.

The histogram for a Bernoulli Distribution of a single coin toss depicts equal probabilities for each outcome.

Bernoulli Distribution *Source: [Wikipedia]
Bernoulli Distribution *Source: [Wikipedia]

Poisson Distribution

The Poisson Distribution steps into the realm of modeling the probability of a specific number of events occurring within a defined time period. Whether it's website views per hour or thunderstorms per month, the Poisson Distribution is apt for scenarios where events can be counted, the mean number of events is known, and each event is independent.

Data professionals leverage the Poisson Distribution to model diverse datasets, ranging from customer service call center calls per hour to financial transactions per second at a bank.

Poisson Distribution *Source: [Wikipedia]
Poisson Distribution *Source: [Wikipedia]

A fundamental grasp of discrete probability distributions not only equips you to comprehend diverse datasets but also lays the groundwork for delving into more advanced statistical methods and machine learning applications.

Continuous Probability Distributions: The Elegance of the Normal Distribution

Before delving into the specifics of the normal distribution, let's establish some foundational concepts that apply to all continuous probability distributions:

Probability Density and Probability

Continuous probability distributions are characterized by probability density functions (PDFs). Unlike discrete distributions represented by probability mass functions (PMFs), PDFs apply to continuous random variables.

A PDF is graphically depicted as a curve, with the bell curve serving as a quintessential example, particularly for the normal distribution.

Understanding Probability in Continuous Distributions

In continuous distributions, the x-axis depicts the variable under consideration, such as cherry tree height, while the y-axis denotes probability density. It's vital to differentiate between probability density and actual probability because continuous variables can take an infinite number of values, rendering the probability of any particular point essentially negligible.

Hence, the focus shifts to probability intervals, such as the likelihood of cherry tree heights falling between 14.5 feet and 15.5 feet. The probability is calculated by determining the area under the curve within the specified interval.

Note: Calculating probabilities in continuous distributions is typically done using statistical software.

The Normal Distribution: A Symphony of Symmetry

The normal distribution, often referred to as the bell curve or Gaussian distribution, stands as a pinnacle among continuous probability distributions. Its graph exhibits symmetry, forming a bell-shaped curve with a peak at the mean and downward sloping sides.

Key Features of the Normal Distribution

  1. Bell Curve Symmetry:

    • The distribution follows a bell curve.
    • The mean is situated at the center, representing the most probable outcome.
  2. Symmetry Around the Mean:

    • The curve is symmetric, with 50% of data above and below the mean.
  3. Probability and Area Under the Curve:

    • The total area under the curve equals 1, accounting for 100% of possible outcomes.

The Empirical Rule: Navigating the Norm

The normal distribution adheres to the empirical rule, offering a structured pattern based on standard deviations from the mean:

  • 68% of values fall within 1 standard deviation.
  • 95% fall within 2 standard deviations.
  • 99.7% fall within 3 standard deviations.
Normal Distribution *Source: [Wikipedia]
Normal Distribution *Source: [Wikipedia]

The normal distribution serves as an indispensable tool for data professionals. Whether modeling heights, weights, blood pressure, or test scores, its ubiquity stems from its prevalence in diverse datasets.

Conclusion: Mastering Probability Essentials

This journey was extensive, and I trust it proved valuable. Continuing on, you delved into discrete probability distributions, mastering techniques to model varied datasets, with the normal distribution emerging as a key tool for comprehending and interpreting data. Probability serves as your guide in navigating the landscape of data science.

Probability
Probability

Stay Tuned

Want to Get in touch?
The best articles, links and news related to AI delivered once a week to your inbox.