Getting Started#

Third-party package dependence#

MPICH — an MPI implementation library, available at http://www-unix.mcs.anl.gov/mpi/mpich.

Note that in some cases, the package hwloc is not automatically installed when installing mpich. One needs to install it and its development package hwloc-devel to use mpi.

There is another popular MPI library OpenMPI. If one sticks to OpenMPI, make sure to use the same MPI library between compiling and running MICA.

GSL — the GNU Scientific Library, downloaded at http://www.gnu.org/software/gs

LAPACKE — the C-interface of LAPACK, downloaded at http://www.netlib.org/lapack/

CMake —- the compilation tool, downloaded at https://cmake.org/download/

Note that one does not need to compile the above packages from the source code. Using package managers will be more convenient.

On Linux system, there are package managers that can install the above libraries convienently. If so, use them, e.g., for Fedora/Redhat distribution
```
dnf install mpich mpich-devel hwloc hwloc-devel gsl gsl-devel lapack lapack-dvel
```
In this case, the libraries usually are installed in standard environment path. Otherwise, any of the above libraries is not installed in standard locations in your system, the compiling configurations below may need slight adjustments.
On MacOS, one can use homebrew package manager to install the required packages
```
brew install mpich hwloc gsl lapack
```
To make lapack and hwloc findable by pkgconfig, export the paths of lapack.pc and hwloc.pc to the environment PKG_CONFIG_PATH. For example, if lapack is installed at /opt/homebrew/opt/lapack and hwloc is installed at /opt/homebrew/opt/hwloc, then execute the commands
```
export PKG_CONFIG_PATH=/opt/homebrew/opt/lapack/lib/pkgconfig:$PKG_CONFIG_PATH
export PKG_CONFIG_PATH=/opt/homebrew/opt/hwloc/lib/pkgconfig:$PKG_CONFIG_PATH
```
or put the above commands into .bashrc. If the above commands do not work, one can also add the paths to the environments LDFLAGS and CFLAGS in the terminal
```
export LDFLAGS=-L/opt/homebrew/opt/lapack/lib:$LDFLAGS
export CPPFLAGS=-I/opt/homebrew/opt/lapack/include:$CPPFLAGS

export LDFLAGS=-L/opt/homebrew/opt/hwloc/lib:$LDFLAGS
export CPPFLAGS=-I/opt/homebrew/opt/hwloc/include:$CPPFLAGS
```

For Intel CPUs, the Intel OneAPI MKL library provides optimized interfaces to LAPACKE and BLAS libraries. Using MKL libraray can improve the running speed and it is therefore highly recommended. MICA automatically checks the system environment variable MKLROOT to determine whether MKL libraray has been installed. If yes, MICA will by default use the interfaces of the MKL library; if not, the original LAPACKE and BLAS libraraies will be used. See Intel OneAPI MKL for the installation. Please keep in mind that one needs to source the MKL’s variable in bashrc, so as to let it callable from MICA.

Compiling#

Exectuable Binary Version: `mica2`#

Edit CMake compiling configurations to be consistent with your system if necessary, using the command

ccmake .

This will draw out a CMake GUI and edit the configurations accordingly. If this is your first installation, the GUI might be empty and type ``c`` to do initial configurating. Please refer to CMake for more details about the useage of CMake GUI.

After this step, compile the package with the command

make

This creates an executable file mica2.

If your system does not have lastest CMake, you can use Makefile provided in the package to do compiling. First edit the configurations in Makefile_old to be consistent with your system’s setting, and then execute the command

make -f Makefile_old

Python Callable Version: `pymica`#

The Python package mpi4py is required. Install it using pip

python -m pip install mpi4py

Note that pip keeps previously built wheel files on its cache for future reuse. If you want to reinstall the mpi4py package using a different or updated MPI implementation, you have to either first remove the cached wheel file with

python -m pip cache remove mpi4py

or ask pip to disable the cache:

python -m pip install --no-cache-dir mpi4py

Now install MICA using the command

python setup.py install --user
#or
python -m pip install .

This will generate a Python package pymica and install it to the user’s Python package sites. In the folder tests/python, the Python script example.py shows how to use pymica.

Running with Binary Version#

First create two subdirectories data/ and param in the current working directory. All the output files will be placed into data/. The subdirectory param is used to place options for CDNest.

To run the package in a parallel computer/cluster, use the following command, e.g.:

mpiexec -n 6 ./mica2 param/param  # here use 6 cores, change it to the numbers you want

where param is the paramter file, stored in the directory param/. This will also generate CDNest option files OPTIONSCON and OPTIONS1D in the subdirectory param/.

If the results are not as good as expected, one may want to modify options for Markov-chain Monte Carlo sampling. There are two ways. The first way is directly editing the parameter file (such as param/param in the above; see below). The second way is editing the above generated option file OPTIONS1D and transfer it to mica2 in the command line as

mpiexec -n 6 ./mica2 param/param param/OPTIONS1D # here use 6 cores, change it to the numbers you want

where OPTIONS1D is an options file stored in the directory param/, see Diffusive Nested Sampling for the detail.

Parameter File#

A typical parameter file looks like:

#
# lines starting with "#" are regarded as comments and are neglected
# if want to turn on the line, remove the beginning "#"
# note that some options are optinal

#==============================================================

FileDir                   ./
DataFile                  data/IRAS_year5.txt

TypeModel                 0                # 0: general model
                                           # 1: pmap, photometric RM
                                           # 2: vmap, use a virtual driving light curve
                                           # 3: mmap, mixture of TF types
                                           # default: 0

TypeTF                    0                # 0: Gaussian
                                           # 1: Top-hat
                                           # 2: Gamma function (k=2)
                                           # 3: Exponential
                                           # default: 0

StrTypeTFMix              30               # string for the TF types of mixture mode (mmap)
                                           # e.g., "02": gaussian-gamma; "20": gamma-gaussian
                                           # valid if TypeModel == 3 (mmap)

MaxNumberSaves            1000             # number of MCMC sampling steps
                                           # default: 2000

FlagUniformVarParams      0                # whether each dataset has the same variability parameters
                                           # default: 0

FlagUniformTranFuns       0                # whether each dataset has the same line parameters.
                                           # note that different lines have different parameters.
                                           # default: 0

FlagLongtermTrend         0                # Longterm trend in light curves, use a polynomial to fit
                                           # input the order of the polynomial, e.g.,
                                           # 0, constant  (default)
                                           # 1, linear line
                                           # 2, conic line
                                           # Use the default if you do not know this.

LagLimitLow               0.0              # lower limit of the range of time lag to be explored
LagLimitUpp               300.0            # upper limit of the range of time lag to be explored
                                           # can be negative

#WidthLimitLow            1.0              # lower and upper limit of lag width
#WidthLimitUpp            50.0             # by default, MICA determines the limits automatically.
                                           # if unsatifactory, turn on these options.

FlagLagPositivity         0                # whether force Gaussians overall located at non-negative lags
                                           # 0: no;  1: yes
                                           # default: 0

FlagNegativeResp          1                # whether turn on negative response
                                           # 0, no; 1, yes
                                           # default: 0

NumCompLow                2                # lower limit of number of Gaussians/tophats
NumCompUpp                2                # upper limit of number of Gaussians/tophats
                                           # default: 1, 1

FlagConSysErr             0                # 0, not include systematic error of continuum; 1, include
FlagLineSysErr            1                # 0, not include systematic error of line; 1, include
                                           # defaul: 0, 0

NumPointRec              200               # number of points in reconstruction for each light curve
                                           # note: a too large number causes very slow reconstruction
                                           # default: 200

TimeRecLowExt            0                 # extend time range of reconstruction
TimeRecUppExt            0                 # if detnote the original time range as [t1, t2]
                                           # then the options lead to [t1+TimeRecLowExt, t2+TimeRecUppExt]
                                           # default: 0, 0

#StrWidthPrior           [1:10:5:20]       # width priors if the default priors not good enough
                                           # format: [width1_1:width1_2:width2_1:width2_2...]
                                           # "WidthLimitLow" and "WidthLimitUpp" no longer applicable
                                           # default: None

TypeLagPrior              1                # type of lag prior for each Gaussians/tophats.
                                           # default: 0
                                           # 0,  limit0 < lag0 < lag1 < lag2 <... < limit1
                                           #
                                           # 1,  limit0 + 0*width < lag0 < limit0 + 1*width
                                           #     limit0 + 1*width < lag1 < limit0 + 2*width
                                           #     ...
                                           #     width = (limit1 - limit0)/num_comp
                                           #
                                           # 2,  lags fixed at specific values, no limit on Guassian sigma/tophat width
                                           #     lag0 = limit0 + 0*dlag
                                           #     lag1 = limit0 + 1*dlag
                                           #     ...
                                           #     dlag = (limit1 - limit0)/(num_comp-1)
                                           #
                                           # 3,  lags fixed at specific values
                                           #     Gaussian sigma ranges at (dlag/2, dlag), tophat wdith=dlag/2
                                           #     lag0 = limit0 + 0*dlag
                                           #     lag1 = limit0 + 1*dlag
                                           #     ...
                                           #     dlag = (limit1 - limit0)/(num_comp-1)
                                           #     better to set a large  mumber of components
                                           #
                                           # 4,  user specified with "StrLagPrior"

StrLagPrior             [0:10:10:50]       # valid if TypeLagPrior==4
                                           # format: [lag1_1:lag1_2:lag2_1:lag2_2...]
                                           # "LagLimitLow" and "LagLimitUpp" no longer applicable

StrRatioPrior           [1.0e-3:1.0]       # the response ratio of 2nd to 1st component
                                           # valid if TypeModel == 1 (pmap)
                                           # format: [ratio_1:ratio_2]
                                           # default: [1.0e-3:1.0]

FlagGap                 0                  # whether include seasonal gap
                                           # 0: no; 1: yes.
                                           # default: 0

#StrGapPrior            [182.6:140.0]      # gap priors if the default priors are not good enough
                                           # valid when FlagGap == 1
                                           # format: [gap_center_set1:gap_width_set1:gap_center_set2:gap_width_set2...]
                                           # gap_center_set1: gap center for 1st dataset (+n*year will also be included)
                                           # gap_width_set1:  gap width for 1st dataset
                                           # default: None
#==================================================================
# options for cdnest sampling
# use the default values or do not turn them on IF NOT familiar with them

# PTol                     0.1             # likelihood tolerance in loge
# NumberParticles          1               # number of particles
# NewLevelIntervalFactor   2               # new level interval
# SaveIntervalFactor       2               # particular saving interval
# ThreadStepsFactor        10              # thread steps before communications between cores
# MaxNumberLevels          0               # maximum number of levels; unlimited for 0
# BacktrackingLength       10.0            # backforward tracking length (lambda)
# StrengthEqualPush        100.0           # strength to force equal push (beta)

#===================================================================

see Diffusive Nested Sampling for the detail of CDNest options.

Running with Python Version#

In Python environment, import mica and other necessary packages as,

from mpi4py import MPI
import numpy as np
import matplotlib.pyplot as plt
import pymica

Then initialize MPI environment as

# initiate MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()

If one did not create the formated data file (see below), one could directly load the light curves and feed them to MICA as

if rank == 0:
  con = np.loadtxt("cont.txt")
  line= np.loadtxt("line.txt")

  # make a data dict
  data_input = {"set1":[con, line]}

  # if multiple datasets, e.g.,
  #data_input = {"set1":[con1, line1], "set2":[con2, line2]}

  # if a dataset has multiple lines, e.g.,
  #data_input = {"set1":[con, line1, line2]}
else:
  data_input = None

data_input = comm.bcast(data_input, root=0)

#create a model
#there are two ways
#1) one way from the param file

#model = pymica.gmodel(param_file="param/param_input")

#2) the ohter way is through the setup function

# type: gmodel(), pmap(), vmap()
model = pymica.gmodel()

# type： gaussian, tophat, gamma, exp
model.setup(data=data_input, type_tf='gaussian', lag_limit=[0, 100], number_component=[1, 1], max_num_saves=2000)

If one already has created the formatted data file (see blow), one can directly input the file name as

model.setup(data_file="file_name", type_tf='gaussian', lag_limit=[0, 100], number_component=[1, 1], max_num_saves=2000)

After the above initialization, run the code as

#run mica
model.run()

#posterior run, only re-generate posterior samples, do not run MCMC
# model.post_run()

#do decomposition for the cases of multiple components
# model.decompose()

# plot results
if rank == 0:

  # plot results, doshow controls whether showing the results on screen
  #
  model.plot_results(doshow=True, tf_lag_range=None, hist_lag_range=None, show_pmax=True)
  model.post_process()  # generate plots for the properties of MCMC sampling

  # get the full sample
  # sample is a list, each element contains an array of posterior samples
  # sample[0] is for the case of number_component[0]
  # sample[1] is for the case of number_component[1]
  # ...
  sample = model.get_posterior_sample()

See Running with Python Version for a detailed guideline.

Data Format#

mica2 reads data files with a format as:

# 1
# 171:269
56690.6100           3.4270          0.0640     % continuum, 171 lines
56691.5400           3.5450          0.0650
...
56864.8600           4.3310          0.0740
56865.9200           4.7080          0.0780

56698.3570           2.1900          0.0560     % line, 269 lines
56699.5590           2.2000          0.0580
...
56830.1490           2.3000          0.0650
56830.4200           2.2900          0.0660

The first line starting with “#” specifies the number of datasets. Here one dataset contains one continuum light curve (the driving source) and several line light curves (at least one). The second line starting with “#” specifies the numbers of points in light curves of continuum and lines, which are separated by “:”.

The next follows data of light curves, going by datasets. For each dataset, the first block is continuum light curve and then line light curves successively. In each data block, the three columns are time, flux, and error, respectively. Blocks/datasets are separated by a blank line.

In the above example, there is one dataset and it contains 171 points in continuum light curve and 269 point in one line light curve. If your data have 2 datasets, the first dataset has 2 line light curves while the second dataset has one light curve, the data file should be formated as:

# 2
# 171:130:90
# 150:122
6100           3.4270          0.0640     % continuum of 1st dataset, 171 lines
5400           3.5450          0.0650
...
8600           4.3310          0.0740
9200           4.7080          0.0780

3570           2.1900          0.0560     % 1st line of 1st dataset, 130 lines
5590           2.2000          0.0580
...
1490           2.3000          0.0650
4200           2.2900          0.0660

3570           2.1900          0.0560     % 2nd line of 1st dataset, 90 lines
5590           2.2000          0.0580
...
1490           2.3000          0.0650
4200           2.2900          0.0660

6100           3.4270          0.0640     % continuum of 2nd dataset, 150 lines
5400           3.5450          0.0650
...
8600           4.3310          0.0740
9200           4.7080          0.0780

3570           2.1900          0.0560     % line of 2nd dataset, 122 lines
5590           2.2000          0.0580
...
1490           2.3000          0.0650
4200           2.2900          0.0660

As you can see, the numbers of lines in each datasets do not needs to be the same.

Output#

mica2 outputs the following main files in the folder data/:

posterior_sample1d.txt_xx

posterior sample for parameters. The postfix “_xx” means the number of Gaussians. The order of parameters in posterior sample file is arranged as:

(systematic error of continuum, sigmad, taud) * number of datasets

(systematic error of line, (gaussian amplitude, center, sigma) * number of gaussians * number of line datasets) * number of datasets

sigmad, taud, gaussian amplitude and sigma are in logarithm scale; systematic errors (x) are dimensionless, defined as x = log(1+err/err_data), where err is the real systematic error and err_data is the mean measurement error of the data.

pall.txt_xx

reconstruction of datasets, with the same format as the input data.

pline.txt_xx_compyy (applicable with -d option)

decomposed light curves for each Gaussian component, with the same format as the input data. yy (a number) indicates which Gaussian component. Note that the continuum light curve is not decomposed and only line light curves are decomposed.

para_names_line.txt_xx

parameters and their priors.

evidence.txt

Bayesian evidence for each number of Gaussians explored.

In the end of running, mica2 prints the obtained Bayesian evidence for each number of Gausssians explored.

Posterior sample#

The posterior sample of parameters is output to posterior_sample1d.txt_xx in the data folder. One can mannually read in the file and then compute statistics to obtain the time lags and other properties of the transfer functions. Alternatively, the Python version pymica provides a function to load the posterior sample

from pymica import utility
# load the posterior sample for number of components = 1
sample = utility.get_posterior_sample(num_comp=1, fdir="./")
# this returns a Pandas table.
print(sample.columns)

Plotting#

There is a Python script plotfig.py provided in the package that can be used to plot the results. Run it with

ptyhon plotfig.py --param param/param

This will generate a PDF file fig_xx.pdf in the subdirectory data/. Use the following command to print help information about this script.

python plotfig.py --help

The output looks like,

usage: python plotfig.py [options]

options:
  -h, --help            show this help message and exit
  --param PARAM         parameter file
  --resp_input RESP_INPUT
                        str, a file storing input response function
  --tf_lag_range TF_LAG_RANGE [TF_LAG_RANGE ...]
                        time lag range for the transfer function, e.g., --tf_lag_range 0 100
  --hist_lag_range HIST_LAG_RANGE [HIST_LAG_RANGE ...]
                        time lag range for the histograms, e.g., --hist_lag_range 0 100
  --hist_bins HIST_BINS [HIST_BINS ...]
                        number of bins for the histograms, e.g., --hist_bins 20
  --show_gap            whether show seasonal gaps, e.g., --show_gap
  --show_pmax           whether show the results of the maximum posterior ppint, e.g., --show_pmax
  --time_shift TIME_SHIFT
                        time shift applied to all light curves, e.g., --time_shift 5000.0

For example, if one wants to adjust the starting time of the light curves, use the option as

python plotfig.py --param param/param --time_shift -50000  # this will subtract 50000 from the time
                                                           # of all light curves for plotting

Testing#

To test mica2, change to the subdirectory tests/ and there are several suites of tests to guide the useage of mica2. See Tests for more details.

Getting Started

Contents