I recently worked on a visualization where I needed to draw numbers from a joint probability distribution. This ended up being much harder than I thought it would be, so I thought I’d write up the results to save someone else some time. There are a number of resources out there for doing this in languages like Python and R, but nothing for JavaScript. I ended up using jStat for many of it’s distributions and helper functions, along with this Stackoverflow answer.
Here are the steps:
- Create uncorrelated samples drawing from a standard normal distribution (mu=0, sigma=1).
- Create a correlation matrix with the desired correlation between the samples.
- Matrix multiply the cholesky decomposition of the correlation matrix with the uncorrelated samples to create correlated normal samples.
- Convert the correlated normal samples to correlated uniform samples using the standard normal cumulative distribution function (CDF). I think the result of this is considered a copula.
- Use the inverted CDF of the desired distributions to convert the correlated uniform samples into correlated samples. This is called inverse transform sampling.
Note that the sample correlation won’t be exactly what you specified in the correlation matrix, but it was close enough for my purposes. If you need a specific correlation, you could always write a while loop that repeats this process until it’s within a certain threshold of your desired correlation. Here’s the code for a joint lognormal distribution:
A nice feature of this approach is that you can use any combination of distributions and create any number of correlated samples. All you need is to do is create the desired correlation matrix, define the distributions with jStat, and then use their inverted CDFs to convert the copula.
For example, if I wanted a joint normal-lognormal distribution, I could use the copula function above, then independently define distributions and use their inverted CDFs:
Note that the map function used above is unique to jStat because it can operate across multiple arrays. The samples
variable above is now an array with two correlated samples, one with a normal distribution and one with a lognormal one: