| Continuous Bernoulli distribution |
|---|
|
Probability density function |
| Parameters |
 |
, natural parameter |
|---|
| Support |
![{\displaystyle x\in [0,1]}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/64a15936df283add394ab909aa7a5e24e7fb6bb2.svg) |
![{\displaystyle x\in [0,1]}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/64a15936df283add394ab909aa7a5e24e7fb6bb2.svg) |
|---|
| PDF |
 where  |
 |
|---|
| CDF |
![{\displaystyle F(x\mid \lambda )={\begin{cases}x,&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}},&{\text{otherwise}}\end{cases}}}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/e3f30166b64bd1fc2b39632cd32acff38a62a431.svg) |
 |
|---|
| Mean |
![{\displaystyle \operatorname {E} [X]={\begin{cases}{\tfrac {1}{2}}&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda }{2\lambda -1}}+{\dfrac {1}{2\tanh ^{-1}(1-2\lambda )}},&{\text{otherwise}}\end{cases}}}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/a229a1c9425627b971410017f5c21a29434163b7.svg) |
![{\displaystyle \operatorname {E} [X]={\begin{cases}1/2&\theta =0\\e^{\theta }/(e^{\theta }-1)-\theta ^{-1}&\theta \neq 0\end{cases}}}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/87f857a7086bf229e37561ae23118f0010262bad.svg) |
|---|
| Variance |
![{\displaystyle \operatorname {Var} (X)={\begin{cases}{\tfrac {1}{12}},&\lambda ={\tfrac {1}{2}}\\[6pt]-{\dfrac {\lambda (1-\lambda )}{(1-2\lambda )^{2}}}+{\dfrac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}},&{\text{otherwise}}\end{cases}}}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/405add7095486acd186661ad3d310818e8dc17a8.svg) |
 |
|---|
In probability theory, statistics, and machine learning, the continuous Bernoulli distribution[1][2][3] is a family of continuous probability distributions parameterized by a single shape parameter
, defined on the unit interval
, by:

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,[4][5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous,
-valued data.[6][7][8][9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete,
-valued data.
The continuous Bernoulli also defines an exponential family of distributions. Writing
for the natural parameter, the density can be rewritten in canonical form:
. [10]
Statistical inference
Given an independent sample of
points
with
from continuous Bernoulli, the log-likelihood of the natural parameter
is

and the maximum likelihood estimator of the natural parameter
is the solution of
, that is,
satisfies

where the left hand side
is the expected value of continuous Bernoulli with parameter
. Although
does not admit a closed-form expression, it can be easily calculated with numerical inversion.
Further properties
The entropy of a continuous Bernoulli distribution is
![{\displaystyle \operatorname {H} [X]={\begin{cases}0&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda \log \left(\lambda \right)-\left(1-\lambda \right)\log \left(1-\lambda \right)}{1-2\lambda }}-\log \left({\frac {2\tanh ^{-1}\left(1-2\lambda \right)}{e\left(1-2\lambda \right)}}\right)&{\text{ otherwise}}\end{cases}}\!}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/32fbc4001a08a1df4efd840b9b1ddf411babb625.svg)
Bernoulli distribution
The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set
by the probability mass function:

where
is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval
results in the continuous Bernoulli probability density function, up to a normalizing constant.
The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when
or
.
Exponential distribution
An exponential distribution with rate
restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter
.
Continuous categorical distribution
The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.[11]
References
- ^ Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
- ^ PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
- ^ Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli Archived 2020-11-25 at the Wayback Machine
- ^ Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- ^ Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
- ^ Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
- ^ Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
- ^ PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
- ^ Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
- ^ Lee, C. J.; Dahl, B. K.; Ovaskainen, O.; Dunson, D. B. (2025). Scalable and robust regression models for continuous proportional data. arXiv preprint arXiv:2504.15269. https://arxiv.org/abs/2504.15269
- ^ Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).
|
|---|
Discrete univariate | with finite support | |
|---|
with infinite support | |
|---|
|
|---|
Continuous univariate | supported on a bounded interval | |
|---|
supported on a semi-infinite interval | |
|---|
supported on the whole real line | |
|---|
with support whose type varies | |
|---|
|
|---|
Mixed univariate | |
|---|
Multivariate (joint) | |
|---|
| Directional | |
|---|
Degenerate and singular | |
|---|
| Families | |
|---|
|