Table Of Contents

Previous topic

Entropy, Information and Bias Correction Primer

Module Reference

pyentropy – Core information theoretic functionality

BaseSystem

class pyentropy.systems.BaseSystem

Base functionality for entropy calculations common to all systems

Methods

I
Ish
Ishush
Ispike
calculate_entropies
pola_decomp
I(corr=None)

Convenience function to compute mutual information

Must have already computed required entropies [‘HX’, ‘HXY’]

Parameters :
corr : str, optional

If provided use the entropies from this correction rather than the default values in self.H

Ish(corr=None)

Convenience function to compute shuffled mutual information estimate

Must have already computed required entropies [‘HX’, ‘HiXY’, ‘HshXY’, ‘HXY’]

Parameters :
corr : str, optional

If provided use the entropies from this correction rather than the default values in self.H

Ishush(corr=None)

Convenience function to compute full shuffled mutual information estimate

Must have already computed required entropies [‘HX’, ‘SiHXi’, ‘HshX’, ‘HiXY’, ‘HshXY’, ‘HXY’]

Parameters :
corr : str, optional

If provided use the entropies from this correction rather than the default values in self.H

Ispike()

Adelman (2003) style information per spike

calculate_entropies(method='plugin', sampling='naive', calc=['HX', 'HXY'], **kwargs)

Calculate entropies of the system.

Parameters :
method : {‘plugin’, ‘pt’, ‘qe’, ‘nsb’, ‘nsb-ext’, ‘bub’}

Bias correction method to use

sampling : {‘naive’, ‘kt’, ‘beta:x’}, optional

Sampling method to use. ‘naive’ is the standard histrogram method. ‘beta:x’ is for an add-constant beta estimator, with beta value following the colon eg ‘beta:0.01’ [R1]. ‘kt’ is for the Krichevsky-Trofimov estimator [R2], which is equivalent to ‘beta:0.5’.

calc : list of strs

List of entropy values to calculate from (‘HX’, ‘HY’, ‘HXY’, ‘SiHXi’, ‘HiX’, ‘HshX’, ‘HiXY’, ‘HshXY’, ‘ChiX’, ‘HXY1’,’ChiXY1’)

Keywords :
qe_method : {‘plugin’, ‘pt’, ‘nsb’, ‘nsb-ext’, ‘bub’}, optional

Method argument to be passed for QE calculation (‘pt’, ‘nsb’). Allows combination of QE with other corrections.

methods : list of strs, optional

If present, method argument will be ignored, and all corrections in the list will be calculated. Use to comparing results of different methods with one calculation pass.

Returns :
self.H : dict

Dictionary of computed values.

self.H_method : dict

Dictionary of computed values using ‘method’.

Notes

  • If the PT method is chosen with outputs ‘HiX’ or ‘ChiX’ no bias correction will be performed for these terms.

References

[R1](1, 2) T. Schurmann and P. Grassberger, “Entropy estimation of symbol sequences,” Chaos,vol. 6, no. 3, pp. 414–427, 1996.
[R2](1, 2) R. Krichevsky and V. Trofimov, “The performance of universal encoding,” IEEE Trans. Information Theory, vol. 27, no. 2, pp. 199–207, Mar. 1981.
pola_decomp()

Convenience function for Pola breakdown

DiscreteSystem

class pyentropy.systems.DiscreteSystem(X, X_dims, Y, Y_dims, qe_shuffle=True)

Bases: pyentropy.systems.BaseSystem

Class to hold probabilities and calculate entropies of a discrete stochastic system.

Attributes :
PXY : (X_dim, Y_dim)

Conditional probability vectors on decimalised space P(X|Y). PXY[:,i] is X probability distribution conditional on Y==i.

PX : (X_dim,)

Unconditional decimalised X probability.

PY : (Y_dim,)

Unconditional decimalised Y probability.

PXi : (X_m, X_n)

Unconditional probability distributions for individual X components. PXi[i,j] = P(X_i==j)

PXiY : (X_m, X_n, Y_dim)

Conditional probability distributions for individual X compoenents. PXiY[i,j,k] = P(X_i==j | Y==k)

PiX : (X_dim,)

Pind(X) = <Pind(X|y)>_y

Methods

I
Ish
Ishush
Ispike
calculate_entropies
pola_decomp
__init__(X, X_dims, Y, Y_dims, qe_shuffle=True)

Check and assign inputs.

Parameters :
X : (X_n, t) int array

Array of measured input values. X_n variables in X space, t trials

X_dims : tuple (n, m)

Dimension of X (input) space; length n, base m words

Y : (Y_n, t) int array

Array of corresponding measured output values. Y_n variables in Y space, t trials

Y_dims : tuple (n ,m)

Dimension of Y (output) space; length n, base m words

qe_shuffle : {True, False}, optional

Set to False if trials already in random order, to skip shuffling step in QE. Leave as True if trials have structure (ie one stimuli after another).

SortedDiscreteSystem

class pyentropy.systems.SortedDiscreteSystem(X, X_dims, Y_m, Ny)

Bases: pyentropy.systems.DiscreteSystem

Class to hold probabilities and calculate entropies of a discrete stochastic system when the inputs are available already sorted by stimulus.

Attributes :
PXY : (X_dim, Y_dim)

Conditional probability vectors on decimalised space P(X|Y). PXY[:,i] is X probability distribution conditional on Y==i.

PX : (X_dim,)

Unconditional decimalised X probability.

PY : (Y_dim,)

Unconditional decimalised Y probability.

PXi : (X_m, X_n)

Unconditional probability distributions for individual X components. PXi[i,j] = P(X_i==j)

PXiY : (X_m, X_n, Y_dim)

Conditional probability distributions for individual X compoenents. PXiY[i,j,k] = P(X_i==j | Y==k)

PiX : (X_dim,)

Pind(X) = <Pind(X|y)>_y

Methods

I
Ish
Ishush
Ispike
calculate_entropies
pola_decomp
__init__(X, X_dims, Y_m, Ny)

Check and assign inputs.

Parameters :
X : (X_n, t) int array

Array of measured input values. X_n variables in X space, t trials

X_dims : tuple (n,m)

Dimension of X (input) space; length n, base m words

Y_m : int

Finite alphabet size of single variable Y

Ny : (Y_m,) int array

Array of number of trials available for each stimulus. This should be ordered the same as the order of X w.r.t. stimuli. Y_t.sum() = X.shape[1]

Utility Functions

pyentropy.base2dec(x, b)

Convert base-b words to decimal values.

Parameters :
x : (t, n) int array

Array of t length-n base-b words

b : int

Base (size of finite alphabet)

Returns :
d_x: (t,)

Array of decimalised values

Note, this is the same as decimalise except input x is ordered differently (here x[t,n] - ie columns are trials).

pyentropy.dec2base(x, b, digits)

Convert decimal value to a row of values representing it in a given base.

Parameters :
x : (t,) or (t,1) int array

Array of decimilised values

b : int

Base for convergence (finite alphabet size)

digits : int

Length of output word for each trial

Returns :

y : (t, digits)

pyentropy.decimalise(x, n, b)

Convert base-b words to decimal values

Parameters :
x : (n, t) int array

Array of t length-n base-b words

b : int

Base (size of finite alphabet)

Returns :
d_x: (t,)

Array of decimalised values

pyentropy.nsb_entropy(P, N, dim, var=False)

Calculate NSB entropy of a probability distribution using external nsb-entropy program.

Required nsb-entropy installed on system path.

Parameters :
P : 1D array

Probability distribution vector

N : int

Total number of trials

dim : int

Full dimension of space

var : {False, True}, optional

Return variance in addition to entropy

pyentropy.prob(x, m, method='naive')

Sample probability of integer sequence.

Parameters :
x : int array

integer input sequence

m : int

alphabet size of input sequence (max(x)<m)

method: {‘naive’, ‘kt’, ‘beta:x’,’shrink’}

Sampling method to use.

Returns :
Pr : float array

array representing probability distribution Pr[i] = P(x=i)

pyentropy.quantise(input, m, uniform='sampling', minmax=None, centers=True)

Quantise 1D input vector into m levels (unsigned)

Parameters :
uniform : {‘sampling’,’bins’}

Determine whether quantisation is uniform for sampling (equally occupied bins) or the bins have uniform widths

minmax : tuple (min,max)

Specify the range for uniform=’bins’ quantisation, rather than using min/max of input

centers : {True, False}

Return vector of bin centers instead of bin bounds

pyentropy.quantise_discrete(input, m)

Re-bin an already discretised sequence (eg of integer counts)

Input should already be non-negative integers

pyentropy.maxent – Finite Alphabet Maximum-Entropy Solutions

Module for computing finite-alphabet maximum entropy solutions using a coordinate transform method

For details of the method see:

Ince, R. A. A., Petersen, R. S., Swan, D. C., Panzeri, S., 2009 “Python for Information Theoretic Analysis of Neural Data”, Frontiers in Neuroinformatics 3:4 doi:10.3389/neuro.11.004.2009 http://www.frontiersin.org/neuroinformatics/paper/10.3389/neuro.11/004.2009/

If you use this code in a published work, please cite the above paper.

The generated transformation matrices for a given set of parameters are stored to disk. The default location for the cache is a .pyentropy (_pyentropy on windows) directory in the users home directory. To override this and use a custom location (for example to share the folder between users) you can put a configuration file .pyentropy.cfg (pyentropy.cfg on windows) file in the home directory with the following format:

[maxent]
cache_dir = /path/to/cache

pyentropy.maxent.get_config_file() will show where it is looking for the config file.

The probability vectors for a finite-alphabet space of n variables with m possible values is a length m**n-1 vector ordered such that the value of the index is equal to the decimal value of the input state represented, when interpreted as a base m, length n word. eg for n=3,m=3:

P[0] = P(0,0,0)
P[1] = P(0,0,1)
P[2] = P(0,0,2)
P[3] = P(0,1,0)
P[4] = P(0,1,1) etc.

This allows efficient vectorised conversion between probability index and response word using base2dec, dec2base. The output is in the same format.

class pyentropy.maxent.AmariSolve(n, m, filename='a_', local=False, confirm=True)

A class for computing maximum-entropy solutions.

When the class is initiliased the coordinate transform matrices are loaded from disk, if available, or generated.

See module docstring for location of cache directory.

An instance then exposes a solve method which returns the maximum entropy distribution preserving marginal constraints of the input probability vector up to a given order k.

This class computed the full transformation matrix and so can compute solutions for any order.

Methods

eta_from_p
p_from_theta
solve
theta_from_p
__init__(n, m, filename='a_', local=False, confirm=True)

Setup transformation matrix for given parameter set.

If existing matrix file is found, load the (sparse) transformation matrix A, otherwise generate it.

Parameters :
n : int

number of variables in the system

m : int

size of finite alphabet (number of symbols)

filename : {str, None}, optional

filename to load/save (designed to be used by derived classes).

local : {False, True}, optional

If True, then store/load arrays from ‘data/’ directory in current working directory. Otherwise use the package data dir (default ~/.pyentropy or ~/_pyentropy (windows)) Can be overridden through ~/.pyentropy.cfg or ~/pyentropy.cfg (windows)

confirm : {True, False}, optional

Whether to prompt for confirmation before generating matrix

solve(Pr, k, eta_given=False, ic_offset=-0.01, **kwargs)

Find maxent distribution for a given order k

Parameters :
Pr : (fdim,)

probability distribution vector

k : int

Order of interest (marginals up to this order constrained)

eta_given : {False, True}, optional

Set this True if you are passing the marginals in Pr instead of the probabilities

ic_offset : float, oprtional

Initial condition offset for the numerical optimisation. If you are having trouble getting convergence, try playing with this. Usually making it smaller is effective (ie -0.00001)

Returns :
Psolve : (fdim,)

probability distribution vector of k-th order maximum entropy solution

theta_from_p(p)

Return theta vector from full probaility vector

eta_from_p(p)

Return eta-vector (marginals) from full probability vector

p_from_theta(theta)

Return full fdim p-vector from fdim-1 length theta

pyentropy.maxent.get_config_file()

Get the location and name of the config file for specifying the data cache dir. You can call this to find out where to put your config.

pyentropy.maxent.get_data_dir()

Get the data cache dir to use to load and save precomputed matrices

pyentropy.statk – Python wrapper of STA Toolkit functions

pyentropy.statk.nsb_entropy()

Calculate entropy using C NSB implementation from Spike Train Analysis Toolkit.

Parameters :
P : (dim,) float array

Probability vector

N : int

Number of trials.

dim : int

Dimension of space

verbose : {False, True}, optional

Print warnings from NSB routine.

var : {False, True}, optional

Return variance in addition to entropy

possible_words : string, optional or int

Strategy for choosing total number of possible words One of [‘recommended’, ‘unique’, ‘total’, ‘possible’,

‘min_tot_pos’, ‘min_lim_tot_pos’]

(default: ‘recommended’). Or an positive integer value, see here <http://neuroanalysis.org/neuroanalysis/goto.do?page=.repository.toolkit_entropts>

nsb_precision : float

Relative precision for numerical integration (default 1e-6)

Returns :
H : float

Entropy.

V : float, optional

Variance (if requested)

pyentropy.statk.bub_entropy()

Calculate entropy using BUB implementation from Spike Train Analysis Toolkit.

Parameters :
P : (dim,) float array

Probability vector

N : int

Number of trials.

dim : int

Dimension of space

verbose : {False, True}, optional

Print warnings from NSB routine.

var : {False, True}, optional

Return variance in addition to entropy

possible_words : string, optional or int

Strategy for choosing total number of possible words One of [‘recommended’, ‘unique’, ‘total’, ‘possible’,

‘min_tot_pos’, ‘min_lim_tot_pos’]

(default: ‘recommended’). Or an positive integer value, see here <http://neuroanalysis.org/neuroanalysis/goto.do?page=.repository.toolkit_entropts>

bub_lambda_0 : float, optional

Lagrange multiplier parameter λ_0 for BUB (default 0)

bub_K : int, optional

K parameter for BUB (default 11)

bub_compat : int, optional

0 - compatible with paper (default) 1 - compatible with posted code

Returns :
H : float

Entropy.

V : float, optional

Variance (if requested)