MCC experiments
Usage
install the dependencies
python -m venv venv
source venv/bin/activate
pip install pyAgrum psycopg2
and run
./mcc.py test.nt 500
Notes
20/11/24
Effect of independent attributes
Consider the following example:
D^*
a na c1
a na c2
a na c3
and the join probability P(b1|a) = 2/3 and P(b2|a) = 1/3. So, the value of B is independent from the attribute C.
There is a rewriting for the following query :
SELECT B FROM T WHERE A=a
but there is no rewriting with a condition on the independent attribute, e.g.
SELECT B FROM T WHERE C=c2
Remarks about the BID
- It allows to represent together the part of the database that is certain of D^*with the part is known only under some probabilities.
- For example by projecting out the null values on one world of the BID, the results is the same as on D^*. It is not the case when by doing so on a database generated from the join distribution.
- Using the BID only, it is impossible to distinguish the queries that can be rewritten from the others.
Open question
Given a MG and
D^*
is there a class C
such that :
- the distance between the join distribution and the empirical distribution of Cis minimal
- the probability of the class Cis strictly positive