Using the MeDIL package to find a minMCM on real data
This post will demonstrate how to use the MeDIL Python package to find a minimal Measurement Dependence Inducing Latent causal model (minMCM) on a real data set. The steps here mirror those from my paper on MeDIL causal models, which you can consult for more detailed analysis of the results and connections to the literature.
The Data set
We will analyze the Stress, Religous Coping, and Depression data set, collected by Bongjae Lee from the University of Pittsburgh in 2003 (for more information, see the technical report it was first published in). To begin, simply download the data set, e.g., with
wget https://causal.dev/post/minmcm_application/depressioncoping.dat
The Python script used for analysing this data set is split into parts and presented throughout the rest of the post. The complete script can be downloaded at the end of the post.
First, we have all of the import statements:
# We'll use these for processing data, computing Pearson Correlation, and displaying results.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# We'll use these for computing the distance correlation.
from dcor import distance_correlation as distcorr, pairwise
from multiprocessing import Pool
# We'll use these for running the independence tests.
from medil.independence_test import hypothesis_test
from medil.independence_test import dependencies
# add HSIC imports here? or include in medil?
Next, we can load the data into Python and prepare it for indpendence testing:
# We only need to read the data for the 61 questions we're interested in, so we leave out demographic questions and ID numbers, etc.
raw = np.genfromtxt('depressioncoping.dat', names=True, delimiter='\t', usecols=np.arange(2, 63))
var_names = raw.dtype.names
# Now we remove the rows containing NaN values, e.g., from a subject not completing the questionnaire.
raw_array = raw.view(np.float).reshape(raw.shape + (-1,))
nan_row_mask = np.isnan(raw_array).any(axis=1)
data = raw_array[~nan_row_mask]
Independence tests
The causal structure learning algorithm find_minMCM
is constraint-based, so we must first estimate the independence constraints from the data set.
Importantly, find_minMCM
requires no assupmtions about linearity, and so we want to use a non-linear measure of dependence, such as the distance correlation or Hilbert-Schmidt Independence Criterion (HSIC) as opposed to the commonly used (but in our case unecessarily restrictive) linear Pearson correlation.
To get a sense of how these measures are different, we will take a look at each of them.
Pearson correlation
We can compute and visualize the pairwise Pearson correlation coefficients:
# Compute the Pearson correlation.
corr = np.corrcoef(data, rowvar=False)
# We're interested in magnitude but not the sign.
abs_corr = np.abs(corr)
# Make a mask for a nicer plot.
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
# Make and display the plot.
sns.heatmap(corr, mask=mask, square=True, vmax=1, vmin=0, xticklabels=var_names, yticklabels=var_names)
plt.show()
The result is the following heatmap:
Distance correlation
We can compute and visualize the pairwise distance correlation coefficients:
# Compute the distance correlation, using multiple cores to speed up processing.
with Pool() as pool:
dist_corr = pairwise(distcorr, data.T, pool=pool)
# Make and display the plot.
sns.heatmap(dist_corr, mask=mask, square=True, vmax=1, vmin=0, xticklabels=var_names, yticklabels=var_names)
plt.show()
The result is the following heatmap:
Hilbert-Schmidt Independence Criterion
Finding a minMCM
Results
Academic is designed to give technical content creators a seamless experience. You can focus on the content and Academic handles the rest.
Highlight your code snippets, take notes on math classes, and draw diagrams from textual representation.
On this page, you'll find some examples of the types of technical content that can be rendered with Academic.
Examples
Code
Academic supports a Markdown extension for highlighting code syntax. You can enable this feature by toggling the highlight
option in your config/_default/params.toml
file.
```python
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```
renders as
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
Math
Academic supports a Markdown extension for $\LaTeX$ math. You can enable this feature by toggling the math
option in your config/_default/params.toml
file and adding markup: mmark
to your page front matter.
To render inline or block math, wrap your LaTeX math with $$...$$
.
Example math block:
$$\gamma_{n} = \frac{
\left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T
\left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}
{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}$$
renders as
\[\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}\]
Example inline math $$\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2$$
renders as \(\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2\) .
Example multi-line math using the \\
math linebreak:
$$f(k;p_0^*) = \begin{cases} p_0^* & \text{if }k=1, \\
1-p_0^* & \text {if }k=0.\end{cases}$$
renders as
\[f(k;p_0^*) = \begin{cases} p_0^* & \text{if }k=1, \\ 1-p_0^* & \text {if }k=0.\end{cases}\]
Diagrams
Academic supports a Markdown extension for diagrams. You can enable this feature by toggling the diagram
option in your config/_default/params.toml
file or by adding diagram: true
to your page front matter.
An example flowchart:
```mermaid
graph TD;
A-->B;
A-->C;
B-->D;
C-->D;
```
renders as
graph TD;
A-->B;
A-->C;
B-->D;
C-->D;
An example sequence diagram:
```mermaid
sequenceDiagram
participant Alice
participant Bob
Alice->John: Hello John, how are you?
loop Healthcheck
John->John: Fight against hypochondria
end
Note right of John: Rational thoughts <br/>prevail...
John-->Alice: Great!
John->Bob: How about you?
Bob-->John: Jolly good!
```
renders as
sequenceDiagram
participant Alice
participant Bob
Alice->John: Hello John, how are you?
loop Healthcheck
John->John: Fight against hypochondria
end
Note right of John: Rational thoughts <br/>prevail...
John-->Alice: Great!
John->Bob: How about you?
Bob-->John: Jolly good!
An example Gantt diagram:
```mermaid
gantt
dateFormat YYYY-MM-DD
section Section
A task :a1, 2014-01-01, 30d
Another task :after a1 , 20d
section Another
Task in sec :2014-01-12 , 12d
another task : 24d
```
renders as
gantt
dateFormat YYYY-MM-DD
section Section
A task :a1, 2014-01-01, 30d
Another task :after a1 , 20d
section Another
Task in sec :2014-01-12 , 12d
another task : 24d
Todo lists
You can even write your todo lists in Academic too:
- [x] Write math example
- [x] Write diagram example
- [ ] Do something else
renders as
- Write math example
- Write diagram example
- Do something else
Tables
Represent your data in tables:
| First Header | Second Header |
| ------------- | ------------- |
| Content Cell | Content Cell |
| Content Cell | Content Cell |
renders as
First Header | Second Header |
---|---|
Content Cell | Content Cell |
Content Cell | Content Cell |
Asides
Academic supports a Markdown extension for asides, also referred to as notices or hints. By prefixing a paragraph with A>
, it will render as an aside. You can enable this feature by adding markup: mmark
to your page front matter, or alternatively using the Alert shortcode.
A> A Markdown aside is useful for displaying notices, hints, or definitions to your readers.
renders as