Using the MeDIL package to find a minMCM on real data

This post will demonstrate how to use the MeDIL Python package to find a minimal Measurement Dependence Inducing Latent causal model (minMCM) on a real data set. The steps here mirror those from my paper on MeDIL causal models, which you can consult for more detailed analysis of the results and connections to the literature.



The Data set

We will analyze the Stress, Religous Coping, and Depression data set, collected by Bongjae Lee from the University of Pittsburgh in 2003 (for more information, see the technical report it was first published in). To begin, simply download the data set, e.g., with

wget https://causal.dev/post/minmcm_application/depressioncoping.dat

The Python script used for analysing this data set is split into parts and presented throughout the rest of the post. The complete script can be downloaded at the end of the post.

First, we have all of the import statements:

# We'll use these for processing data, computing Pearson Correlation, and displaying results.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# We'll use these for computing the distance correlation.
from dcor import distance_correlation as distcorr, pairwise
from multiprocessing import Pool

# We'll use these for running the independence tests.
from medil.independence_test import hypothesis_test
from medil.independence_test import dependencies

# add HSIC imports here? or include in medil?

Next, we can load the data into Python and prepare it for indpendence testing:

# We only need to read the data for the 61 questions we're interested in, so we leave out demographic questions and ID numbers, etc.
raw = np.genfromtxt('depressioncoping.dat', names=True, delimiter='\t', usecols=np.arange(2, 63))
var_names = raw.dtype.names

# Now we remove the rows containing NaN values, e.g., from a subject not completing the questionnaire.
raw_array = raw.view(np.float).reshape(raw.shape + (-1,))
nan_row_mask = np.isnan(raw_array).any(axis=1)

data = raw_array[~nan_row_mask]

Independence tests

The causal structure learning algorithm find_minMCM is constraint-based, so we must first estimate the independence constraints from the data set. Importantly, find_minMCM requires no assupmtions about linearity, and so we want to use a non-linear measure of dependence, such as the distance correlation or Hilbert-Schmidt Independence Criterion (HSIC) as opposed to the commonly used (but in our case unecessarily restrictive) linear Pearson correlation. To get a sense of how these measures are different, we will take a look at each of them.

Pearson correlation

We can compute and visualize the pairwise Pearson correlation coefficients:

# Compute the Pearson correlation.
corr = np.corrcoef(data, rowvar=False)

# We're interested in magnitude but not the sign.
abs_corr = np.abs(corr)

# Make a mask for a nicer plot.
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

# Make and display the plot.
sns.heatmap(corr, mask=mask, square=True, vmax=1, vmin=0, xticklabels=var_names, yticklabels=var_names)
plt.show()

The result is the following heatmap:

heatmap of pairwise Pearson correlation

Distance correlation

We can compute and visualize the pairwise distance correlation coefficients:

# Compute the distance correlation, using multiple cores to speed up processing.
with Pool() as pool:
    dist_corr = pairwise(distcorr, data.T, pool=pool)

# Make and display the plot.
sns.heatmap(dist_corr, mask=mask, square=True, vmax=1, vmin=0, xticklabels=var_names, yticklabels=var_names)
plt.show()

The result is the following heatmap:

heatmap of pairwise distance correlation

Hilbert-Schmidt Independence Criterion

Finding a minMCM

Results

Academic is designed to give technical content creators a seamless experience. You can focus on the content and Academic handles the rest.

Highlight your code snippets, take notes on math classes, and draw diagrams from textual representation.

On this page, you'll find some examples of the types of technical content that can be rendered with Academic.

Examples

Code

Academic supports a Markdown extension for highlighting code syntax. You can enable this feature by toggling the highlight option in your config/_default/params.toml file.

```python
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```

renders as

import pandas as pd
data = pd.read_csv("data.csv")
data.head()

Math

Academic supports a Markdown extension for $\LaTeX$ math. You can enable this feature by toggling the math option in your config/_default/params.toml file and adding markup: mmark to your page front matter.

To render inline or block math, wrap your LaTeX math with $$...$$.

Example math block:

$$\gamma_{n} = \frac{ 
\left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T 
\left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}
{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}$$

renders as

\[\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}\]

Example inline math $$\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2$$ renders as \(\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2\) .

Example multi-line math using the \\ math linebreak:

$$f(k;p_0^*) = \begin{cases} p_0^* & \text{if }k=1, \\
1-p_0^* & \text {if }k=0.\end{cases}$$

renders as

\[f(k;p_0^*) = \begin{cases} p_0^* & \text{if }k=1, \\ 1-p_0^* & \text {if }k=0.\end{cases}\]

Diagrams

Academic supports a Markdown extension for diagrams. You can enable this feature by toggling the diagram option in your config/_default/params.toml file or by adding diagram: true to your page front matter.

An example flowchart:

```mermaid
graph TD;
  A-->B;
  A-->C;
  B-->D;
  C-->D;
```

renders as

graph TD;
  A-->B;
  A-->C;
  B-->D;
  C-->D;

An example sequence diagram:

```mermaid
sequenceDiagram
  participant Alice
  participant Bob
  Alice->John: Hello John, how are you?
  loop Healthcheck
      John->John: Fight against hypochondria
  end
  Note right of John: Rational thoughts <br/>prevail...
  John-->Alice: Great!
  John->Bob: How about you?
  Bob-->John: Jolly good!
```

renders as

sequenceDiagram
  participant Alice
  participant Bob
  Alice->John: Hello John, how are you?
  loop Healthcheck
      John->John: Fight against hypochondria
  end
  Note right of John: Rational thoughts <br/>prevail...
  John-->Alice: Great!
  John->Bob: How about you?
  Bob-->John: Jolly good!

An example Gantt diagram:

```mermaid
gantt
  dateFormat  YYYY-MM-DD
  section Section
  A task           :a1, 2014-01-01, 30d
  Another task     :after a1  , 20d
  section Another
  Task in sec      :2014-01-12  , 12d
  another task      : 24d
```

renders as

gantt
  dateFormat  YYYY-MM-DD
  section Section
  A task           :a1, 2014-01-01, 30d
  Another task     :after a1  , 20d
  section Another
  Task in sec      :2014-01-12  , 12d
  another task      : 24d

Todo lists

You can even write your todo lists in Academic too:

- [x] Write math example
- [x] Write diagram example
- [ ] Do something else

renders as

  • Write math example
  • Write diagram example
  • Do something else

Tables

Represent your data in tables:

| First Header  | Second Header |
| ------------- | ------------- |
| Content Cell  | Content Cell  |
| Content Cell  | Content Cell  |

renders as

First Header Second Header
Content Cell Content Cell
Content Cell Content Cell

Asides

Academic supports a Markdown extension for asides, also referred to as notices or hints. By prefixing a paragraph with A>, it will render as an aside. You can enable this feature by adding markup: mmark to your page front matter, or alternatively using the Alert shortcode.

A> A Markdown aside is useful for displaying notices, hints, or definitions to your readers.

renders as

Did you find this page helpful? Consider sharing it 🙌

Postdoc Researcher

I analyze and develop causal discovery algorithms, especially for applications in the cognitive and brain sciences, using tools from graph theory and algebraic statistics.