Predicate logic and Probabilistic reasoning -- Artificial Intelligence


Index

  1. Representing Simple Facts in Logic
  2. Discrete mathematics recap from semester 4. (Pre-requisite to predicate logic)
  3. For the first 4 logical operators Quick rundown of basics
  4. Implication.
  5. Biconditional Operator.
  6. Precedence of Logical Operators.
  7. Important logical formulae (Pre-requisite to predicate logic)
  8. Logical Inference Basic Terminology (Semester 4). (Pre-requisite to Resolutions and Natural deduction)
  9. Rules of inference. (Pre-requisite to Resolutions and Natural deduction)
  10. Predicate Logic (Semester 4). (And in AI too)
  11. Predicates
  12. Quantifiers.
  13. How to write predicate logic (Pre-requisite to instances and is-a relationships)
  14. Representing instance and ISA (is-a) relationships
  15. 1. What are instances and ISA (is-a) relationships?
  16. 3. Using Predicate Logic to Represent Instances and IS-A Relationships
  17. Computable functions and predicates
  18. 1. What are computable functions?
  19. Resolution and Natural deduction
  20. 1. Resolution
  21. 2. Natural Deduction
  22. The Gray Area in Natural Deduction
  23. Resolution vs. Natural Deduction in AI Systems
  24. Probabilistic reasoning in Artificial Intelligence
  25. Bayes Theorem and Total Probability
  26. Representing knowledge in an uncertain domain.
  27. Bayesian Networks
  28. Dempster-Shafer Theory
  29. Fuzzy logic and fuzzy sets

Representing Simple Facts in Logic

In knowledge representation, predicate logic (or first-order logic) is commonly used to represent simple facts. Predicate logic allows us to define relations and properties of objects in a more expressive way than propositional logic, making it suitable for representing knowledge in AI.

Key Components in Predicate Logic:

  1. Constants: Represent specific objects or entities. Example: john, apple, room1.
  2. Predicates: Describe properties of objects or relationships between objects. Example: Loves(john, mary) represents "John loves Mary."
  3. Variables: Stand for general elements in the domain, often used in conjunction with quantifiers. Example: x, y.
  4. Quantifiers:
    • Universal quantifier (): Indicates that a statement applies to all elements in a domain. Example: ∀x Loves(john, x) means "John loves everyone."
    • Existential quantifier (): Indicates the existence of at least one element in the domain that satisfies a property. Example: ∃x Loves(x, mary) means "Someone loves Mary."

Example of Representing Simple Facts:

Let's represent some facts in predicate logic:

How This Helps in AI:

Representing facts in logic allows AI systems to:

For a more in-depth explanation, I added relevant notes from discrete maths from last semester.


Discrete mathematics recap from semester 4.

These notes were originally quite wacky in presentation so I fixed up a lot of stuff this time around.

What is Propositional Logic?

Area of logic that studies ways of joining and/or modifying propositions to form complicated propositions.

Example: "Adam is good at playing football and is representing his college at national level."

This statement could be broken into two propositions P1 and P2:

P1: Adam is good at playing football
P2: is representing his college at national level

Here P1 and P2 are propositional variables

Here the "logical connective" (logical operator) used to form a complicated and meaningful proposition is the AND operator.

Now you can either choose to read about this in detail or follow the link below for a brief but good enough explanation:

https://www.youtube.com/watch?v=6490tKrGEic&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=22


Logical Operators

Types of Logical Operators :

Some of these are from ADE.

  1. Negation (NOT)
  2. Conjunction (AND)
  3. Disjunction (OR)
  4. Exclusive OR (EX-OR / XOR)
  5. Implication ( pq )
  6. Biconditional ( pq )

For the first 4 logical operators: Quick rundown of basics:

  1. Negation :

    Let there be a propositional variable p.

    Denoted by : Pasted image 20240609014344.png

    Truth Table :

    Pasted image 20240609014611.png

  2. Conjunction :

    Let there be two propositional variables p and q

    Conjunction of p and q will be denoted by:

Pasted image 20240609014733.png (p^q)

Truth Table :

Pasted image 20240609014859.png

  1. Disjunction :

    Disjunction of p and q will be denoted by:

    Pasted image 20240609014934.png

    Truth table:

    Pasted image 20240609015044.png

  2. Exclusive-OR:

    Denoted by:

Pasted image 20240609015248.png

Truth Table:

Pasted image 20240609015313.png


Implication.

"if p then q"

Denoted by : pq

Truth table :

p q pq
T T T
T F F
F T T
F F T

Note:

if the value of first variable (p) is F (false) then no matter what the value of q is, p --> q will always be T (True).

"Easy understandable pseudocode explanation from my end":

if p and q == True then:
	p --> q == True
		
else if p == True and q == False:
	p --> q == False
		
else if p == False:
	ignore q.
	p --> q == True.

Pasted image 20240609020154.png


Some examples for implication:

Pasted image 20240609020242.png

Pasted image 20240609020309.png


Biconditional Operator.

Let p and q be two propositional variables.
The biconditional statement of the form : pq is the proposition p "if and only if" q

pq is true whenever the truth values of p and q are the same.

Pasted image 20240609021037.png


Representations of Biconditional Operator.

Pasted image 20240609021134.png


Precedence of Logical Operators.

The precedence helps us to decide which operator will get evaluated first in a complicated looking compound proposition.

Highest precedence starts from 1 to the lowest being 5.

Pasted image 20240609022114.png


Important logical formulae

Important Logical Equivalences, you will need these to convert english sentences into predicate logic

Here are some essential logical equivalences that you might find useful:

Pasted image 20241103125718.png


Logical Inference : Basic Terminology (Semester 4).

Inference involves two parts :

  1. Premise
  2. Conclusion

Premise is a proposition on which one would be able to draw a conclusion.

Therefore initially we assume something is true and on the basis of that we draw some conclusions.

Conclusion is a proposition that is reached from the given set of premises.

Basically :

if premise then conclusion

let p = premise, c = conclusion

Then pc.


Rules of inference.

Rules of inference are the templates for constructing valid arguments.

Types of inference rules :

  1. Modus Ponens.

    Pasted image 20240609025325.png

    Here it says that for two propositional variables p and q :

    if p --> q is True AND p is True then the Conclusion will be q (True)

  2. Modus Tollens.

    Pasted image 20240609025559.png

    Here ~ (squiggly symbol) is the same as the negation operator ¬.

    It states that for two propositional variables p and q :

    if pq is True AND NOT(q) is True then the Conclusion will be NOT(p) (True).

  3. Hypothetical Syllogism

    Pasted image 20240609030153.png

    This states that, for three propositional variables p , q and r :

    if pq and qr then the Conclusion will be pr (Transitive property)

  4. Disjunctive Syllogism
    Pasted image 20240609030417.png

It states that for two propositional variables p and q :

if p or q is True AND NOT(p) is True then the Conclusion will be q

  1. Addition

Pasted image 20240609031005.png

It states that for two propositional variables p and q :
if p is True then the Conclusion will be p or q.

  1. Simplification
    Pasted image 20240609031220.png
    It states that for two propositional variables p and q :

If p AND q is True then the conclusion will be p.

  1. Conjunction

    Pasted image 20240609031417.png

Simply p AND q.

  1. Resolution

    Later referenced in Resolution and Natural deduction
    Pasted image 20240609031607.png

This states that, for three propositional variables p , q and r :

if p OR q is True AND NOT(q) OR r is True then the Conclusion will be q or r.


Predicate Logic (Semester 4).

https://www.youtube.com/watch?v=FpGeg27Ffk8&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=42 (A better explanation than the neso academy playlist).

Used to verify whether an argument is correct or incorrect.

To understand Predicate Logic, we need to understand :

  1. Predicates
  2. Quantifiers

Predicates

Predicates are statements involving variables which are neither True nor False, until or unless the values of the variables are specified.

They are NOT propositions. (Propositions have a fixed value, either True or False.)

Pasted image 20240609033616.png

Pasted image 20240609033828.png


Quantifiers.

Quantifiers are words that refer to quantities such as "some" or "all". It tells for how many elements a given predicate is True.

In English, Quantifiers are used to express the quantities without giving an exact number.

E.g: all, same, many, move, few, etc.

Sentences like: "Can I have some water?" or "Jack has many friends."

Types of Quantifiers:

  1. Universal Quantifiers.
  2. Existential Quantifiers.

Pasted image 20240609034228.png

Pasted image 20240609034330.png

Here "domain" is merely the input of the function Q.


Universal Quantifiers

Definition : The Universal Quantification of some function P(x) is the statement
"P(x)" for all values of x in the domain

or in simple terms :

Universal quantifier says that "for all values, let's say x in the domain of something, applies to this proposition, or the proposition is true for all those values in that domain".

Pasted image 20240609034748.png


Domain or Domain of Discourse

A domain specifies the possible set of values of the variable under consideration.

For example : Let P(x) is the the statement " x+1 > x"

Let us assume that the domain is the set of all positive integers.

Therefore,

P(1): 1+1 > 1 (True)
P(2): 2+2 > 2 (True).

Thus,

Pasted image 20240609035056.png


Existential Quantifiers

Definition : The existential quantification of a function P(x) is the proposition "There exists an element in the domain such that P(x)........"

or in simple terms, "There exists an element x in the domain such that a proposition P(x) holds true for that element".

Pasted image 20240609035856.png

Pasted image 20240609035938.png


How to write predicate logic

https://www.youtube.com/watch?v=Aw3EOSr64j0&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=43

To convert an english statement to a predicate logic is very much needed, as it helps conveying and structuring information properly for an AI system to understand, and make decisions based on that information.

So let's say we have a statement given :

"Not all that glitters is gold".

So analyzing the statement, we get to know that this statement follows the universal quantifier (presence of the word "all").

Let's say we focus only on the part of "all that glitters is gold", for now.

So we could assume two functions, Glitter(x) and Gold(x) .

And thus, we have two viable solutions :

(x)Glitter(x)Gold(x)

and $$\forall(x)Glitter(x) \wedge Gold(x)$$
Both are viable in this context, but which is the better answer?

Let's draw up the operator's truth tables to judge that.

Let's assume that the Glitter(x) function is denoted by p.
and the Gold(x) function is denoted by q.

a) For the implication operator:

p q p -> q
T T T
T F F
F T T
F F T

So if we were to consider the following reasonings :

  1. All that glitters is gold, that's definitely satisfied.
  2. All that is not glittering, is not gold, valid.
  3. All that is not glittering, could be gold, valid. (For example, the metal, rose gold, it doesn't glitter, but it is still an alloy of gold.)
  4. All that glitters, is not gold, valid. (For example any other metal/object/material that could glitters too, but that doesn't mean that it's gold).

So here we see that implication satisfies all cases.

b) Now in case of the conjunction operator:

p q pq
T T T
T F F
F T F
F F F

So if we were to consider the following reasonings :

  1. All that glitters is gold, that's definitely satisfied.
  2. All that is not glittering, is not gold, valid.
  3. All that is not glittering, could be gold, invalid.
  4. All that glitters, is not gold, invalid.

So conjunction says that in order to be gold, it has to glitter or else it's not gold. So conjunction doesn't cover all the possible cases.

It is thus, more suited to a specific example, and thus the existential quantifier.

However since we are dealing with the universal quantifier, which considers all cases, implication is the better answer in this case.

So now coming back to the original statement.

"Not all that glitters is gold", for the not, we will add a negation operator in the front.

So we currently get the predicate logic as :

¬(x)[Glitters(x)Gold(x)]

Now if we simplify this equation a bit

The negation of a universal quantifier leads to an existential quantifier.

So now the predicate logic becomes : $$\exists(x)\neg[Glitters(x) \rightarrow Gold(x)]$$
Now we simplify the implication first using P>Q=¬PQ

(x)¬[¬Glitters(x)Gold(x)]

Now from De-Morgan's law ¬(PQ)=¬P¬Q

Thus, we get our final predicate logic as :

(x)[Glitters(x)¬Gold(x)]

which is read as : "There exists an x such that x glitters and x is not gold."

This way the statement "Not all that glitters is gold, or all that glitters is not gold", can be conveyed properly to an AI for it's understanding.

Here is a video link on the topic of "negation of quantifiers" for the curious cat:

https://www.youtube.com/watch?v=XYfTz5gziBk&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=44


For more practice on quantifiers:

(Neso academy videos)

Universal Quantifiers : https://www.youtube.com/watch?v=6Ebxy33e-Wk&list=PLBlnK6fEyqRhqJPDXcvYlLfXPh37L89g3&index=36&pp=iAQB

Existential Quantifiers: https://www.youtube.com/watch?v=YsWn609V6As&list=PLBlnK6fEyqRhqJPDXcvYlLfXPh37L89g3&index=39&pp=iAQB

Expressing Quantifications in English : https://www.youtube.com/watch?v=JY03HH0p14U&list=PLBlnK6fEyqRhqJPDXcvYlLfXPh37L89g3&index=37&pp=iAQB


Representing instance and ISA (is-a) relationships

1. What are instances and ISA (is-a) relationships?


2. Examples in AI systems


3. Using Predicate Logic to Represent Instances and IS-A Relationships

Instances

To represent an instance in predicate logic, we typically use a predicate that links an entity to its category. For example:

IS-A Relationships

To represent IS-A relationships, we use implications in predicate logic. For example, if we want to represent that "Dog IS-A Animal," we would write:

(x)(Dog(x)Animal(x)

This statement means, "For all x, if x is a dog, then x is also an animal."

Example Scenario

Consider the following hierarchy and example:

  1. Animal (superclass)
    • Mammal (subclass of Animal)
      • Dog (subclass of Mammal)
        • Instance: "Buddy" (instance of Dog)

To represent this in predicate logic, we might write:

  1. (x)Dog(x)Mammal(x)
  2. (x)Mammal(x)Animal(x)
  3. Dog(Buddy)

Using these statements, we can derive that:

Through these logical statements, we establish the hierarchical relationships that allow the AI to understand that Buddy inherits the properties of Dog, Mammal, and Animal.


4. Importance of Instances and IS-A Relationships in AI Systems

The use of instances and IS-A relationships provides several advantages for AI systems:


5. Challenges and Limitations


Computable functions and predicates

1. What are computable functions?

A computable function is a function that can be calculated using a finite series of steps. More formally, it’s a function where, given an input, an algorithm exists that can determine the function's output in a finite amount of time.

In AI, computable functions are useful for creating rule-based systems where inputs are processed to produce specific outputs, such as in expert systems, planning, and decision-making.

Example of a Computable Function

Consider a simple computable function that calculates the square of an integer:

f(x)=x2

Given an input x, the function will output x2, and there exists a finite procedure (multiplying x by itself) to calculate the output.


2. Predicates in computable functions

Predicates are functions that return true or false based on a specific property or condition. In predicate logic, predicates help define relationships or properties that an element may have. In the context of computable functions, predicates can be used to define properties or constraints on inputs and outputs.

For instance:


3. Using Computable functions and predicates together

In AI systems, computable functions and predicates are often combined to make complex decisions. For example, we might use predicates to check certain conditions and computable functions to compute values based on those conditions.

Example Scenario

Suppose we have an AI system designed to approve loan applications based on specific criteria. We can represent the criteria using predicates and functions as follows:

  1. Predicates:

    • Income(x) returns true if applicant x's income meets a certain threshold.
    • CreditScore(x) returns true if applicant x's credit score is above a certain threshold.
  2. Computable Functions:

    • LoanAmount(x): computes the maximum loan amount applicant x qualifies for based on their income and credit score.

In this scenario, the system might decide:

If

Income(x)CreditScore(x)$$,thenapprove$$LoanAmount(x)

Here, the predicates determine eligibility, and the computable function calculates the exact loan amount.


4. Representation in Predicate Logic

We can formalize the use of computable functions and predicates within predicate logic by combining logical operators and quantifiers.

For instance:

Now, looking at this statement one might think that implication is a valid choice here, since the whole thing is an if-then statement.

But, as fitting as it sounds, no implication is not a valid choice here.


Conjunction vs. Implication in the Eligibility Example

In the eligibility example:

Eligible(x)Income(x)CreditScore(x)

we're defining eligibility as something that only holds true if both conditions (income and credit score) are satisfied. Here, the conjunction is used to indicate that both predicates (income and credit score) must be true for eligibility.


Why Implication Wouldn't Be Ideal Here

If we were to use implication, it would look something like:

Income(x)CreditScore(x)Eligible(x)

This implies that whenever both income and credit score conditions hold, eligibility holds as a consequence. This might sound correct, but implication is looser in logic because it does not require both conditions to be necessary for eligibility. In fact, under implication:


Conjunction Enforces Necessity

By using: $$Eligible(x) \leftrightarrow Income(x) \wedge CreditScore(x)$$

we’re setting up a necessary and sufficient condition: an applicant is eligible if and only if they meet both conditions. This is exactly what we want in a rule-based AI system for eligibility criteria.


Universal Quantifier Consideration

If we were to generalize this eligibility criterion for all applicants, we could write it as:

(x)[Eligible(x)Income(x)CreditScore(x)]

This would mean that all applicants must meet both conditions to be eligible, which is valid for establishing universal rules.

Now let's get back to the main topic.


5. Importance in AI Systems

Computable functions and predicates are particularly important in AI for:


6. Computable Functions, Predicates, and Their Relation to Algorithms

Computable functions often map directly to algorithms that solve specific tasks, such as sorting, searching, or evaluating expressions. By defining functions and predicates, AI systems can break down complex tasks into smaller, computable parts that each solve a piece of the overall problem.

For example, suppose an AI needs to check if a path exists between two nodes in a graph:

  1. We define a predicate PathExists(x,y) to check if there’s a path between nodes x and y.
  2. We implement a computable function FindPath(x,y) that returns the path between x and y if it exists.

7. Limitations and Challenges

While computable functions and predicates are powerful, they also have limitations:


Resolution and Natural deduction

1. Resolution

Resolution is a powerful inference rule used primarily in propositional logic and first-order predicate logic. It’s widely used in AI systems for automated reasoning, particularly in logic programming and automated theorem proving.

From the Rules of inference. , in the last rule, Resolution we see that :

Pasted image 20240609031607.png

This states that, for three propositional variables p , q and r :

if p OR q is True AND NOT(q) OR r is True then the Conclusion will be q or r.


What is Resolution?

In simple terms, resolution is a method to deduce a contradiction by combining pairs of statements until only one literal (a variable or its negation) remains. If we derive a contradiction (e.g., both a statement and its negation), it indicates that the original assumption is unsatisfiable.


How Resolution Works

Resolution operates on statements in clausal form (conjunctions of disjunctions). Each clause is treated as a disjunction of literals. Here’s the key principle:


Example of Resolution

Let’s say we have these two statements:

  1. PQ
  2. ¬PR

The resolution rule allows us to combine these to get: $$Q \vee R$$

Here’s what happened:


Using Resolution for Proofs

To prove that a statement is true using resolution:

  1. Assume the negation of the statement we want to prove.
  2. Apply the resolution rule repeatedly on the set of clauses.
  3. If we derive an empty clause (i.e., a contradiction), then our assumption was wrong, meaning the original statement must be true.

This method is commonly known as proof by contradiction or refutation.


2. Natural Deduction

Natural Deduction is another technique for reasoning in predicate logic. Unlike resolution, which focuses on finding contradictions, natural deduction provides direct rules for deriving conclusions step-by-step.

What is Natural Deduction?

Natural deduction is based on intuitive rules that reflect how we naturally derive conclusions from assumptions. Each rule corresponds to a logical operation, such as conjunction, disjunction, implication, and negation. Using these rules, we can derive a conclusion from premises without aiming for a contradiction.

These are the same as Rules of inference. (click this link to go back there) and a few more

Let's understand this with an example.

Example of Natural Deduction

Suppose we want to prove:
If “All humans are mortal” and “Socrates is a human,” then “Socrates is mortal.”

We could start with the first premise.

For the statement, "All humans are mortal", we could first define predicates :

Human(x) and Mortal(x)

Thus we can derive a predicate logic as:

(x)(Human(x)Mortal(x))

And for the second premise, we could define it as an instance relationship where $$Human(Socrates)$$

So from the first premise we get to see if $$\forall(x)(Human(x) \rightarrow Mortal(x))$$ is true, then

for $$Human(Socrates)$$,

Mortal(Socrates)

So Socrates is an instance of Mortal

So the system will naturally deduce that Socrates is a mortal (or was, as we know).


However, there is a gray area in this deduction

We had the predicate logic as :

(x)(Human(x)Mortal(x))

This means that for any individual x, if x is a human, then x is mortal.


Implication Truth Table

The truth table for implication pq (where p is "Human(x)" and q is "Mortal(x)" is:

p q pq
T T T
T F F
F T T
F F T

The Gray Area in Natural Deduction

What I thought was this

If we apply the truth table of implication here,
We see the cases
Human is mortal = True
Human is not mortal = False
==Not human is mortal = True? ==
Not human is not mortal = True

The third line, which in the truth table of implication is F , T -> T

How can something be not human and be not immortal? Hit a bit of a gray area don't you think?

Since we share this planet with a lot of non-humans and we know for a fact they are mortals, some maybe long lived than us, but still mortals.

Generally we could extend this to animals, and other life-forms, but still I feel that there's too much ambiguity.

A conjunction could possibly fix this instead? As it limits the condition to only humans being mortals and keeps out ambiguity?


The reasoning

This is where the ambiguity comes in. The implication does not state that only humans are mortal, just that if someone is human, they are mortal. Non-humans could also be mortal, and the implication would still hold true.

Using Conjunction

To restrict the statement strictly to humans, a conjunction might indeed provide more clarity if you're aiming to express a more limited relationship. For instance:

(x)(Human(x)Mortal(x))(x)(NotHuman(x)NotMortal(x))

This would state that not only are all humans mortal, but it also imposes a condition on non-humans, stating they are not mortal.

Which kinda doesn't make sense again since non-humans, literally our nearest relatives, apes, monkeys, chimps, are all mortals, but again aliens could be mortal/immortal. Heck even God who technically non-human is immortal (let's just think hypothetically here).

So a lot of ambiguity huh? That's the downside of Natural Deduction you could say. This type of ambiguity arises when the context is not sufficient.


Resolution vs. Natural Deduction in AI Systems

Natural Deduction

Resolution

Both techniques are valuable tools in knowledge representation and reasoning for AI, enabling systems to infer knowledge, prove statements, and make decisions based on logical rules.

So one could say that natural deduction is better when let's say letting an AI think or theorize or indulge in thought experiments, where as Resolution is better when handling grounded logic, stuff that is already proven.


Probabilistic reasoning in Artificial Intelligence

Bayes Theorem and Total Probability

https://www.youtube.com/watch?v=SktJqrYereQ&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=45

Bayes' theorem provides a way to update the probability of a hypothesis based on new evidence. It essentially answers the question: If we know the likelihood of certain evidence given a hypothesis, and we know the probability of the hypothesis itself, how does observing the evidence change our confidence in the hypothesis?

The formula is as follows:

P(A|B)=P(B|A).P(A)P(B)

Here:

The idea behind Bayes' theorem is to update our belief in a hypothesis as new data (or evidence) becomes available. This helps refine predictions and decisions in situations where we are working with incomplete information.


A better understanding of the terminologies.

Let's understand this in a more simple language.

So we can visualize the situation using a graph:

flowchart LR;
	E-->A
	E-->B

Let's say we have two events, A and B. Both have an equal chance of happening

  1. P(AB): This represents the probability of event A happening given that event B has already happened.

    • “Given that B is true” means we’re considering only situations where B has already occurred and are looking at how likely A is to happen under that condition.
    • For example, if A represents having a disease and B represents testing positive for the disease, then P(AB) is the probability of having the disease given that you've tested positive.
  2. P(BA): Similarly, this is the probability of event B happening given that event A has already happened.

    • “Given that A is true” means we’re considering only scenarios where A has occurred and are interested in the likelihood of B under that condition.
    • In the example above, P(BA) would be the probability of testing positive if you actually have the disease.
  3. P(A) and P(B): These are just the independent probabilities of each event happening without any conditions.

    • Since we assumed both events have an equal chance of occurring, P(A)=0.5 and P(B)=0.5, but this is just an assumption for simplicity. In real scenarios, these probabilities are often quite different, and might not always have a 50-50 chance of occurrence.

To sum up, P(AB) focuses on how likely A is to happen in cases where B is known to have happened, while P(BA) flips the condition and asks how likely B is if A has already occurred.


Example: Medical Diagnosis -- Total Probability and Bayes theorem

Suppose we want to diagnose a rare disease based on a test result. Let:

Now, let’s assign some example probabilities:

  1. Prior Probability, P(A): Suppose the disease affects 1 in 1,000 people, so P(A)=0.001.
  2. **Likelihood, P(BA): If a person has the disease, the test correctly gives a positive result 99% of the time, so P(BA)=0.99.
  3. Marginal Probability, P(B): The probability of a positive test result in the entire population, factoring in both people with and without the disease.

To calculate P(B), we use the law of total probability: $$\boxed{P(B) = P(B|A).P(A) + P(B|\neg A) . P(\neg A)}$$
where:

These are also known as complementary probability.

However in this case, if we assume a false positive rate (P(B|¬A) = 5% , or 0.05, we get :

P(B)=(0.99.0.001)+(0.05).(0.999)=0.00099+0.04995=0.05094

Interpretation: Despite a positive test result, the probability of actually having the disease is about 1.94%. This is much lower than one might expect, highlighting how even reliable tests can have surprising implications when dealing with rare events.


Representing knowledge in an uncertain domain.

In AI, uncertain domains often come into play when a system must make predictions or decisions based on incomplete, ambiguous, or noisy data. For instance, a medical diagnostic system that interprets symptoms for a diagnosis faces uncertainty since symptoms might not always point to a single disease. This requires the AI system to manage probabilities rather than certainties.

Here’s how we generally go about representing this knowledge:

  1. Probabilistic Statements:

    • We use probabilities to represent the likelihood of events or conditions. Each event or condition, like "the patient has disease X," is assigned a probability value.
  2. Prior Knowledge:

    • We start by incorporating prior probabilities (often from historical data or general information). For example, in a diagnostic tool, the likelihood of certain diseases given the patient’s demographics or other general symptoms might serve as prior knowledge.
  3. Conditional Probability:

    • As we encounter new information, such as a specific symptom (or test result), we update the probabilities to reflect this new data. Conditional probabilities like P(diseasesymptom) become central because they adjust the initial (prior) probability in light of observed evidence.
  4. Bayesian Inference:

    • Bayesian inference is applied to continuously update probabilities as more data or evidence becomes available, using Bayes’ theorem. This makes probabilistic reasoning dynamic and adaptable to changing inputs.
  5. Decision Theory:

    • Finally, to enable the system to make choices in uncertain situations, decision theory often supplements probabilistic reasoning. Here, we analyze different outcomes by combining probability with utility (or reward) to identify the most favorable decision.

By leveraging probabilities, AI systems can reason in uncertain domains, considering not only the likelihood of certain outcomes but also how new evidence impacts these probabilities.


Example

Let's understand this better with an example.

This example also serves an example on Bayesian Networks.

Example Scenario: Diagnosing a Disease Based on Symptoms

Imagine we’re building a simple diagnostic system that assesses the likelihood of a patient having the flu based on certain symptoms—say, a fever and a sore throat.

Here’s how we’d represent this information probabilistically:

Step 1: Define Events


Step 2: Establish Probabilities

Or better, with a probability tree(also known as a Bayesian network) :

flowchart LR;
	Patient--P(A)-->Flu
	Flu-->0.1
	Patient--P(!A)-->No_Flu
	No_Flu-->0.9
	Patient--P(B)-->Fever
	Patient--P(C)-->Sore_throat
	Fever--P(B | !A)-->Fever_without_flu
	Fever_without_flu-->0.2
	Flu--P(B | A)-->Flu_with_fever
	Flu_with_fever-->0.8
	Sore_throat--P(C | A)-->Flu_with_sore_throat
	Flu_with_sore_throat-->0.7
	Sore_throat--P(C | !A)-->Sore_throat_without_flu
	Sore_throat_without_flu-->0.3
	Flu_with_fever-->Flu_with_fever_and_sorethroat
	Flu_with_sore_throat-->Flu_with_fever_and_sorethroat
	Patient--P(A| B ^ C)-->Flu_with_fever_and_sorethroat	
	Fever--P(B | C ^ A)-->Fever_with_sorethroat_and_flu
	Fever_with_sorethroat_and_flu-->0.56
	Flu_with_fever-->Fever_with_sorethroat_and_flu
	Flue_with_sore_throat-->Fever_with_sorethroat_and_flu
	Fever--P(B ^ C)-->Fever_and_sorethroat
	Sore_throat--P(B ^ C)-->Fever_and_sorethroat
	Fever_and_sorethroat-->0.11
	Fever_and_sorethroat--P(B ^ C | A)-->Fever_and_sorethroat_with_flu
	Fever_and_sorethroat_with_flu-->0.056
	Fever_and_sorethroat--P(B ^ C | !A)-->Fever_and_sorethroat_without_flu
	Fever_and_sorethroat_without_flu-->0.0054

Step 3: Using Conditional Probability to Update Beliefs

When the system observes a patient with a fever (B) and a sore throat (C), it uses this evidence to update the probability of flu (A) using Bayes’ theorem.

We’re looking to find: $$P (A | B \wedge C)$$
which is the probability of a flu of a patient who has both fever and a sore throat.


Step 4: Apply Bayes’ Theorem

Using Bayes’ theorem in an uncertain domain, we can calculate P(A|BC) like this:

P(A|BC)=P(B|CA).P(A)P(BC)
  1. Find P(BC|A) = P(B|A).P(C|A)=(0.8).(0.7)=0.56, which is the probability of the patient having a fever, while having the flu and a sore-throat.
  2. Find P(BC|¬A)=P(B|¬A).P(C|¬A)=(0.2).(0.3)=0.6, which is the probability of the patient having fever and sore-throat without a flu.
  3. Find P(BC)=P(BC|A).P(A)+P(BC|¬A).P(¬A)=(0.56).(0.1)+(0.06).(0.9)=0.11, which is the probability of a patient having fever and sore throat only.
  4. Finally, we find P(A|BC), which is the probability of a flu of a patient who has both fever and a sore throat.

Thus, $$P(A|B \wedge C) = \frac{P(B|C \wedge A) . P(A)}{P(B \wedge C)} = \frac{(0.56).(0.1)}{0.11} = 0.51$$


Result

After observing both symptoms, the probability that the patient has the flu increases to 51% from the initial 10%. This updated probability reflects the system’s improved understanding, based on the presence of uncertain evidence.

This example illustrates how we represent knowledge under uncertainty. We start with prior beliefs, update them as new information (or symptoms) emerges, and arrive at a more accurate, evidence-based conclusion.


Bayesian Networks

Here is a video explanation of it

https://www.youtube.com/watch?v=DVnubVOjZtg&list=PLxCzCOWd7aiHGhOHV-nwb0HR5US5GFKFI&index=47


Graph example

flowchart LR;
	Patient--P(A)-->Flu
	Flu-->0.1
	Patient--P(!A)-->No_Flu
	No_Flu-->0.9
	Patient--P(B)-->Fever
	Patient--P(C)-->Sore_throat
	Fever--P(B | !A)-->Fever_without_flu
	Fever_without_flu-->0.2
	Flu--P(B | A)-->Flu_with_fever
	Flu_with_fever-->0.8
	Sore_throat--P(C | A)-->Flu_with_sore_throat
	Flu_with_sore_throat-->0.7
	Sore_throat--P(C | !A)-->Sore_throat_without_flu
	Sore_throat_without_flu-->0.3
	Flu_with_fever-->Flu_with_fever_and_sorethroat
	Flu_with_sore_throat-->Flu_with_fever_and_sorethroat
	Patient--P(A| B ^ C)-->Flu_with_fever_and_sorethroat	
	Fever--P(B | C ^ A)-->Fever_with_sorethroat_and_flu
	Fever_with_sorethroat_and_flu-->0.56
	Flu_with_fever-->Fever_with_sorethroat_and_flu
	Flue_with_sore_throat-->Fever_with_sorethroat_and_flu
	Fever--P(B ^ C)-->Fever_and_sorethroat
	Sore_throat--P(B ^ C)-->Fever_and_sorethroat
	Fever_and_sorethroat-->0.11
	Fever_and_sorethroat--P(B ^ C | A)-->Fever_and_sorethroat_with_flu
	Fever_and_sorethroat_with_flu-->0.056
	Fever_and_sorethroat--P(B ^ C | !A)-->Fever_and_sorethroat_without_flu
	Fever_and_sorethroat_without_flu-->0.0054

Considering this graph from our previous example,

A Bayesian Network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies. It consists of:

  1. Nodes: Nodes represent random variables (e.g., "Patient has flu," "Patient has fever," "Patient has sore throat").
  2. Edges: Edges represent conditional dependencies between variables.

The key characteristics of a Bayesian network that this graph satisfies are:

In this specific scenario, the patient is the root node, the patient's condition, flu, fever and sore throat are its children. The probabilities associated with each node and the dependencies between them capture the uncertainty and relationships in the medical diagnosis process.


Visualizing and Interpreting Bayesian Networks

Bayesian networks are often visualized using graphs like the one above. Each node is typically represented by a shape (e.g., circle, square, oval), and edges are represented by arrows. The probability distributions associated with each node can be shown either directly on the nodes or in a separate table.

Interpreting these networks can involve:

By representing the diagnostic scenario as a Bayesian network, we gain a powerful tool for reasoning with uncertainty and making probabilistic inferences based on the available evidence.


Dempster-Shafer Theory

https://www.youtube.com/watch?v=X85lad3rW_I

What is Dempster-Shafer Theory?

Dempster-Shafer theory (DST), also known as evidence theory or belief function theory, provides a framework for reasoning with uncertain, incomplete, or ambiguous information. It’s based on two main ideas:

  1. Belief Functions: Rather than assigning precise probabilities to events, DST allows for assigning a range of probabilities.
  2. Combining Evidence: DST has rules for combining multiple sources of evidence, which can strengthen or adjust our belief in an event.

Key Concepts in Dempster-Shafer Theory

  1. Frame of Discernment:

    • This is the set of all possible outcomes or hypotheses. For example, if we’re diagnosing a patient, the frame of discernment might be {Flu, Cold, Allergy}.
  2. Basic Probability Assignment (BPA):

    • Each subset of the frame of discernment is assigned a number between 0 and 1, called the mass or belief mass, representing the degree of belief exactly in that subset (and not any other subset).
    • The sum of all belief masses for the subsets must equal 1.
    • For example, m({Flu}) = 0.6 might indicate that 60% of our belief supports the patient having the flu.
  3. Belief (Bel) and Plausibility (Pl):

    • Belief: For any subset A, belief is the total belief that supports A with certainty. Belief is calculated by summing the masses of all subsets that are contained within A.
    • Plausibility: This represents the maximum belief that could support A and is calculated by summing the masses of all subsets that intersect A.

Mathematically:

  1. Dempster’s Rule of Combination:

Give an example of the dempster-shafer theory, how it works.

Use the following text and continue from there

Scenario: Diagnosing a Disease

Imagine a doctor is trying to diagnose a patient who has symptoms that could indicate one of three possible conditions:

  1. Flu
  2. Cold
  3. Allergy

We’ll represent these conditions with a frame of discernment Θ={Flu,Cold,Allergy}

The doctor has access to two diagnostic tests:

Each test provides a degree of belief for certain subsets of Θ

Let's say Test 1 returns a high level of confidence in "Flu" while being neutral about "Cold" and "Allergy." This means the test result could be represented as:

Test 1: {Belief(Flu) = 0.8} , {Belief(Cold) = 0.1} , {Belief(Allergy) = 0.1}

Similarly, Test 2 might reveal a moderate belief in "Cold" and "Allergy," while showing little confidence in "Flu." This could be represented as:

Test 2: {Belief(Cold) = 0.6} , {Belief(Allergy) = 0.4} , {Belief(Flu) = 0.1}

The Dempster-Shafer theory allows the doctor to combine these two pieces of evidence in a way that reflects uncertainty. Instead of simply averaging the results, it considers the potential for conflicts and overlaps between the tests.

For instance, if Test 1 strongly suggests "Flu" but Test 2 hints at "Cold" or "Allergy," the theory would acknowledge this contradiction and adjust the overall belief accordingly. This process involves calculating focal elements (subsets of Θ that have a certain degree of belief) and using Dempster's rule of combination to update these beliefs based on the combined evidence from both tests.

Ultimately, the Dempster-Shafer theory provides a more nuanced and flexible framework for medical diagnosis compared to simple probability approaches. It acknowledges the inherent uncertainty in medical information and allows doctors to make more informed decisions by considering all available evidence and potential conflicts.


Advantages of Dempster-Shafer Theory

Limitations of Dempster-Shafer Theory

Dempster-Shafer theory is a powerful way to represent and reason under uncertainty when exact probabilities aren’t available or when different sources of evidence need to be combined.


Fuzzy logic and fuzzy sets

1. Fuzzy Sets

In classical (crisp) set theory, an element either belongs to a set or it doesn’t. For example, if we define a set of "tall people" with a minimum height of 6 feet, a person who is 5'11" would not be in this set at all, while someone who is 6'1" would fully belong to it.

In contrast, fuzzy sets allow elements to have degrees of membership. Instead of saying someone is simply "tall" or "not tall," fuzzy sets allow for partial membership, with values between 0 (not in the set) and 1 (fully in the set).

Example of a Fuzzy Set

Suppose we have a fuzzy set for "tall people" with a membership function μtall(x) that assigns each height x a value between 0 and 1:

This degree of membership allows fuzzy sets to capture the "gray areas" of classification where crisp sets cannot.

2. Membership Functions

The membership function μ defines the degree to which an element belongs to a fuzzy set. Membership functions can take various shapes, including:

These functions are tailored to suit the specific fuzzy concept. For example, "warm temperature" could have a membership function that gradually increases as the temperature goes from 60°F to 80°F, peaking at 75°F.

3. Fuzzy Logic

Fuzzy logic extends classical logic by allowing degrees of truth. In binary logic, statements are either true (1) or false (0). Fuzzy logic introduces intermediate truth values between 0 and 1, which makes it useful for reasoning with imprecise or subjective information.

Basic Fuzzy Logic Operators

Just like classical logic, fuzzy logic has operators like AND, OR, and NOT. However, these operators are adapted to work with degrees of truth:

  1. Fuzzy AND (Intersection): The truth value of A AND B is typically the minimum of the truth values of A and B.
AND(A,B)=min(μA,μB)
  1. Fuzzy OR (Union): The truth value of A OR B is typically the maximum of the truth values of A and B. $$OR(A,B) = max(\mu_A, \mu_B)$$
  2. Fuzzy NOT (Complement): The truth value of NOT A is the complement of the truth value of A.
NOT(A)=1μA

4. Example of Fuzzy Logic in Action

Let’s work through a simple example of how fuzzy logic might be used.

Scenario: Air Conditioning Control

Imagine we want to control an air conditioner based on the temperature and humidity in a room. We’ll use fuzzy logic to decide whether the air conditioner should be set to "high," "medium," or "low."

  1. Define Fuzzy Sets for Temperature and Humidity

    • Temperature: Define fuzzy sets for Cold, Warm, and Hot.
    • Humidity: Define fuzzy sets for Low, Moderate, and High.
  2. Define Fuzzy Rules Fuzzy rules are if-then statements that describe how to react to certain fuzzy conditions. Here are some example rules:

    • Rule 1: IF temperature is Hot AND humidity is High, THEN set AC to High.
    • Rule 2: IF temperature is Warm AND humidity is Moderate, THEN set AC to Medium.
    • Rule 3: IF temperature is Cold, THEN set AC to Low.
  3. Apply Membership Functions Suppose the current temperature is 85°F, and humidity is 70%. We can use the membership functions for each fuzzy set to find the degree of truth for each condition:

    • Temperature: μHot(85)=0.8, μWarm(85)=0.2
    • Humidity: μHigh(70)=0.7, μModerate(70)=0.3
  4. Evaluate the Rules

    • For Rule 1: Temperature is Hot (0.8) AND Humidity is High (0.7). Using the AND operation (minimum), the truth value of this rule is 0.7.
    • For Rule 2: Temperature is Warm (0.2) AND Humidity is Moderate (0.3). The truth value of this rule is 0.2.
    • For Rule 3: Temperature is Cold (membership value is 0, so the rule does not activate).
  5. Combine Results and Defuzzify: After evaluating the rules, we need to determine a final output (High, Medium, Low) for the AC setting. This is often done through a process called defuzzification, which converts fuzzy values back to a crisp decision.

    One common defuzzification method is the centroid method, which calculates a weighted average based on the truth values. In this example, since Rule 1 has the highest truth value (0.7), the AC might be set to High.


5. Applications of Fuzzy Logic

Fuzzy logic is widely used in systems that require approximate reasoning, including:

It’s particularly useful when working with human-centered concepts where things are often “a bit of this” or “a bit of that,” rather than strictly true or false.


So one could say that this introduces "Half-truths" into logic.

One of the core strengths of fuzzy logic is that it allows for partial truths or “half-truths,” which are values that lie between complete truth (1) and complete falsehood (0). This concept makes fuzzy logic very powerful for modeling and reasoning in situations where binary true/false statements don’t capture the nuances of real-world conditions.

In traditional binary logic, something can only be entirely true or false. For example, the statement “The room is warm” would either be true (1) or false (0), with no middle ground. However, in fuzzy logic, we can represent degrees of truth, allowing the room to be "partially warm" with a value between 0 and 1 based on actual conditions.

Sounds quite similar to quantum mechanics if you ask me.

For instance:

These intermediate values represent “half-truths” or partial truths, indicating how well the current state fits a category like "warm."

How This Works in Practice

The ability to handle partial truths makes fuzzy logic particularly valuable in applications where clear boundaries don’t exist. For example:

Example of a Half-Truth in Fuzzy Logic

Imagine a statement: “The weather is cool.” On a day when the temperature is mild, fuzzy logic might assign a truth value of 0.5 to this statement, suggesting that it’s “somewhat true” or “half-true” that the weather is cool. If the temperature drops, the truth value might increase (say, to 0.8), representing that it’s “mostly true” the weather is cool. Conversely, if the temperature rises, the truth value might decrease.

This concept of half-truths is also why fuzzy logic is often associated with linguistic variables like “cool,” “warm,” or “hot,” which have inherently fuzzy boundaries. Fuzzy logic helps translate these vague human terms into computable values, allowing AI systems to handle more natural, human-like reasoning.