From 791d015b8a9582c2fc90898f07304adbb1a58522 Mon Sep 17 00:00:00 2001 From: Maxime Buron <maxime.buron@uca.fr> Date: Wed, 20 Nov 2024 15:51:40 +0100 Subject: [PATCH] README --- README.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..fd12b96 --- /dev/null +++ b/README.md @@ -0,0 +1,52 @@ +# MCC experiments + +## Usage + +install the dependencies and run + +``` +./mcc.py test.nt 500 +``` + +## Notes + +### 20/11/24 + +#### Effect of independent attributes + +Consider the following example: + +D^* +``` +a na c1 +a na c2 +a na c3 +``` + +and the join probability P(b1|a) = 2/3 and P(b2|a) = 1/3. So, the value of B is independent from the attribute C. + +There is a rewriting for the following query : + +```sql +SELECT B FROM T WHERE A=a +``` + +but there is no rewriting with a condition on the independent attribute, e.g. + +```sql +SELECT B FROM T WHERE C=c2 +``` + +#### Remarks about the BID + +- It allows to represent together the part of the database that is certain of $D^*$ with the part is known only under some probabilities. +- For example by projecting out the null values on one world of the BID, the results is the same as on $D^*$. It is not the case when by doing so on a database generated from the join distribution. +- Using the BID only, it is impossible to distinguish the queries that can be rewritten from the others. + +#### Open question + +Given a MG and $D^*$ is there a class $C$ such that : + +1. the distance between the join distribution and the empirical distribution of $C$ is minimal +2. the probability of the class $C$ is strictly positive + -- GitLab