Core –- Parts and Usage Overview

The code is divided up into files to contain pieces with similar purposes or concepts in the algorithm. Each file has its own single module for defining a namespace used when importing its names into other files. Each module exports members intended for public access, but the code in this project explicitly names its imports to maintain clarity in what is used and where it comes from.

Missions.jl

InformativeSampling.Missions — Module

This module contains functions for initializing mission data and the function for running the entire search mission. The entry-point to the actual informative sampling. This contains the main loop and most of the usage of Samples and belief models.

Main public types and functions:

InformativeSampling.Missions.Mission — Type

The main function that runs the informative sampling routine. For each iteration, a sample location is selected, a sample is collected, the belief model is updated, and visuals are possibly shown. The run finishes when the designated number of samples is collected.

Inputs:

func: any function to be run at the end of the update loop, useful for visualization or saving data (default does nothing)
samples: a vector of samples, this can be used to jump-start a mission or resume a previous mission (default empty)
beliefs: a vector of beliefs, this pairs with the previous argument (default empty)
seed_val: the seed for the random number generator, an integer (default 0)
sleep_time: the amount of time to wait after each iteration, useful for visualizations (default 0)

Outputs:

samples: a vector of new samples collected
beliefs: a vector of probabilistic representations of the quantities being searched for, one for each sample collection

Examples

using Missions: synMission

mission = synMission(num_samples=10) # create the specific mission
samples, beliefs = mission(visuals=true, sleep_time=0.5) # run the mission

InformativeSampling.Missions.Mission — Type

Fields:

occupancy::Any: an occupancy map, true in cells that are occupied
sampler::Any: a function that returns a measurement value for any input
num_samples::Any: the number of samples to collect in one run
sampleCostType::Any: a constructor for the function that returns the (negated) value of taking a sample (default DistScaledEIGF)
weights::Any: weights for picking the next sample location
start_locs::Any: the locations that should be sampled first (default [])
prior_samples::Any: any samples taken previously (default empty)
kernel::Any: the kernel to be used in the belief model (default multiKernel)
means_use::Any: whether or not to use a non-zero mean for each quantity (default true)
means_learn::Any: whether or not to learn means (default false)
noise_value::Any: a named tuple of noise value(s) (default [0.0, 0.0, ...])
noise_learn::Any: whether or not to learn noise further (default false)
use_cond_pdf::Any: whether or not to use the conditional distribution of the data to train the belief model (default false)
hyp_drop::Any: whether or not to drop hypotheses and settings for it (default (false, 10, 5, 0.4))

Defined as a keyword struct, so all arguments are passed in as keywords:

mission = Mission(; occupancy,
                  sampler,
                  num_samples,
                  sampleCostType,
                  weights,
                  start_locs,
                  prior_samples,
                  noise,
                  kernel)

InformativeSampling.Missions.replay — Method

replay(
    func,
    M::InformativeSampling.Missions.Mission,
    full_samples,
    beliefs;
    sleep_time
)

Replays a mission that has already taken place. Mainly for visualization purposes.

Inputs:

func: any function to be run at the end of the update loop, useful for

visualization or saving data (default does nothing)

full_samples: a vector of samples
beliefs: a vector of beliefs
sleep_time: the amount of time to wait after each iteration, useful for

visualizations (default 0)

Samples.jl

InformativeSampling.Samples — Module

This module contains everything to do with sampling values in the environment. Its alias types Location and SampleInput are fundamental pieces for MultiQuantityGPs as well.

Main public types and functions:

InformativeSampling.Samples.GridMapsSampler — Type

Handles samples of the form (location, quantity) to give the value from the right map. Internally a tuple of GridMaps.

Constructor can take in a tuple or vector of GridMaps or each GridMap as a separate argument.

Examples

ss = GridMapsSampler(GridMap(zeros(5, 5)), GridMap(ones(5, 5)))

loc = [.2, .75]
ss(loc) # result: [0, 1]
ss((loc, 2)) # result: 1

InformativeSampling.Samples.Location — Type

mutable struct Array{Float64, 1} <: DenseVector{Float64}

Location of sample

InformativeSampling.Samples.Sample — Type

Struct to hold the input and output of a sample.

Fields:

x::Tuple{Vector{Float64}, Int64}: the sample input, usually a location and sensor id
y::Any: the sample output or observation, a scalar

InformativeSampling.Samples.SampleInput — Type

struct Tuple{Vector{Float64}, Int64}

Sample input, the combination of: (Location, sensor index)

InformativeSampling.Samples.SampleOutput — Type

struct Tuple{Float64, Float64}

Value of sample measurement, the measurement mean and standard deviation

InformativeSampling.Samples.UserSampler — Type

A sampler that asks the user to input measurement values, one for each quantity at the given location.

InformativeSampling.Samples.selectSampleLocation — Method

selectSampleLocation(sampleCost, bounds) -> AbstractArray

The optimization of choosing a best single sample location.

Inputs:

sampleCost: a function from sample location to cost (x->cost(x))
bounds: map lower and upper bounds

Returns the sample location, a vector

InformativeSampling.Samples.takeSamples — Method

takeSamples(loc, sampler) -> Any

Pulls a ground truth value from a given location and constructs a Sample object to hold them both.

Inputs:

loc: the location to sample
sampler: a function that returns ground truth values
quantities: (optional) a vector of integers which represent which quantities to sample, defaults to all of them

Outputs a vector of Samples containing input x and measurement y

SampleCosts.jl

InformativeSampling.SampleCosts — Module

This module holds a variety of SampleCost functions used by Samples.jl in selecting a new sample location. The purpose is to pick the location that will minimize the given function.

Each sample cost function in this module is a subtype of the abstract SampleCost type. Their common interface consists of two functions:

values(sampleCost, location): returns the values of the terms (μ, σ, τ, P);
this is typically what each subtype will override;
in all of these, μ = belief model mean, σ = belief model std, τ = travel distance, P = proximity value;
all of these are for the specific location
sampleCost(location): the actual sample cost at the location; it has a default method explained in SampleCost;
for the equations the location is denoted $x$

Many of these were experiment cost functions and aren't recommended. The main ones recommended for use, in order, are

Others can be useful if one wants to do some more experimentation. Note that all these functions are currently hardcoded to use the first quantity as the objective quantity unless otherwise stated. Unless explicitly used, the distance value is generally used to make sure unreachable locations are forbidden (their value will be Inf).

Main public types and functions:

InformativeSampling.SampleCosts.DerivVar — Type

Uses the norm of the derivative of the belief model mean and the belief model variance:

\[C(x) = - w_1 \, {\left\lVert \frac{\partial μ}{\partial x}(x) \right\rVert}^2 - w_2 \, σ^2(x)\]

InformativeSampling.SampleCosts.DistLogEIGF — Type

A variation on EIGF that takes the logarithm of the variance and adds a distance cost term that is normalized by the average of the region dimensions:

\[C(x) = - w_1 \, (μ(x) - y(x_c))^2 - w_2 \, \log(σ^2(x)) + w_3 \, β \, \frac{τ(x)}{\left\lVert \boldsymbol{\ell}_d \right\rVert_1}\]

where $β$ is a parameter to delay the distance effect until a few samples have been taken.

InformativeSampling.SampleCosts.DistProx — Type

Combines the average mean value, average standard deviation, travel distance, and proximity as terms:

\[C(x) = - w_1 \, μ_{\mathrm{ave}}(x) - w_2 \, σ_{\mathrm{ave}}(x) + w_3 \, τ(x) + w_4 \, P(x)\]

where $P(x) = \sum_i(\frac{\min(\boldsymbol{\ell}_d)}{4 \, \mathrm{dist}_i})^3$. Averages are performed over all quantities.

InformativeSampling.SampleCosts.DistScaledDerivVar — Type

Uses the norm of the derivative of the belief model mean and the belief model variance, then scales it all by a normalized travel distance:

\[C(x) = \frac{- w_1 \, {\left\lVert \frac{\partial μ}{\partial x}(x) \right\rVert}^2 - w_2 \, σ^2(x)} {1 + β \, \frac{τ(x)}{\left\lVert \boldsymbol{\ell}_d \right\rVert_1}}\]

where $β$ is a parameter to delay the distance effect until a few samples have been taken.

InformativeSampling.SampleCosts.DistScaledEIGF — Type

Augments EIGF with a factor to scale by a normalized travel distance:

\[C(x) = \frac{- w_1 \, (μ(x) - y(x_c))^2 - w_2 \, σ^2(x)} {1 + β \, \frac{τ(x)}{\left\lVert \boldsymbol{\ell}_d \right\rVert_1}}\]

where $β$ is a parameter to delay the distance effect until a few samples have been taken.

InformativeSampling.SampleCosts.EIGF — Type

Expected Informativeness for Global Fit (EIGF). This function is adapted from ^[Lam] through added weights to choose the balance between exploration and exploitation. It has the form:

\[C(x) = - w_1 \, (μ(x) - y(x_c))^2 - w_2 \, σ^2(x)\]

where $x_c$ is the nearest collected sample location.

InformativeSampling.SampleCosts.InfoGain — Type

Derived from the idea of information gain across the region. Returns the entropy of a set of points (10x10 grid) given the new sample location. Minimizing this entropy is equivalent to maximizing information gain since the entropy before the sample is always the same.

This function is very computationally expensive, which is why the test grid is set at 10x10.

It has the form:

\[C(x) = \log |Σ|\]

InformativeSampling.SampleCosts.LogLikelihood — Type

Idea derived from the log likelihood of query location. Similar to EIGF, the measured value from the nearest sample is used:

\[C(x) = - w_1 \, \left( \frac{μ(x) - y(x_c)}{σ_n} \right)^2 - w_2 \, \log (σ^2(x))\]

where $x_c$ is the nearest collected sample location, and $σ_n$ is the noise.

This function was seen to work well but hasn't undergone extensive tests. There is still some question about the theory and the use of noise parameter vs signal amplitude parameter.

InformativeSampling.SampleCosts.LogLikelihoodFull — Type

A test of the log likelihood idea but using a weighted sum of all measured sample values, not just the nearest one:

\[C(x) = - w_1 \, \frac{1}{\sum_i k(x, x_i)} \sum_i k(x, x_i) * \left( \frac{μ(x) - y(x_i)}{σ_n} \right)^2 - w_2 \, \log (σ^2(x))\]

where $x_i$ is each collected sample location, and $σ_n$ is the noise.

This function's performance wasn't satisfactory.

InformativeSampling.SampleCosts.LogNormed — Type

Combines the average belief value and the log of the average uncertainty value of all quantities. All belief and uncertainty values are first normalized by the maximum belief value of that quantity. It has the form:

\[C(x) = - w_1 \, μ_{\textrm{norm-ave}}(x) - w_2 \, \log (σ_{\textrm{norm-ave}}(x))\]

InformativeSampling.SampleCosts.MEPE — Type

This was a cost function being considered but wasn't finished. Taken from ^[Liu].

InformativeSampling.SampleCosts.MIPT — Type

A simple cost function that doesn't use a belief model but works purely on distances. It returns the negated distance to the nearest sample:

\[C(x) = - \min_i \left\lVert x - x_i \right\rVert\]

Useful when maximizing distance between samples.

InformativeSampling.SampleCosts.OnlyVar — Type

Returns the negated variance of the belief model at the query location:

\[C(x) = - σ^2\]

InformativeSampling.SampleCosts.SampleCost — Type

Typically construct a SampleCost through SampleCostType(occupancy, samples, beliefModel, quantities, weights)

A pathCost is constructed automatically from the other arguments.

This object can then be called to get the cost of sampling at a location: sampleCost(x)

InformativeSampling.SampleCosts.SampleCost — Method

Cost to take a new sample at a location $x$. This is a fallback method that calculates a simple linear combination of all the values of a SampleCost.

Has the form:

\[C(x) = - w_1 \, μ(x) - w_2 \, σ(x) + w_3 \, τ(x) + w_4 \, P(x)\]

InformativeSampling.SampleCosts.VarTrace — Type

Similar to InfoGain but uses only variances rather than the full covariance matrix, which is the same as the trace instead of the log determinant. This reduces computation but is still more costly than a single point estimate. This uses a 20x20 grid of test points.

It has the form:

\[C(x) = \textrm{tr}(Σ)\]

InformativeSampling.SampleCosts.values — Method

values(
    sc::InformativeSampling.SampleCosts.SampleCost,
    loc
) -> Tuple{Float64, Any, Float64, Float64}

Returns the values to be used to calculate the sample cost (belief mean, standard deviation, travel distance, sample proximity).

Each concrete subtype of SampleCost needs to implement.

This can be a useful function to inspect values during optimization.

LamLam, C Q (2008) Sequential adaptive designs in computer experiments for response surface model fit (Doctoral dissertation). The Ohio State University.
LiuLiu H, Cai J, Ong Y (2017) An adaptive sampling approach for kriging metamodeling by maximizing expected prediction error. Comput Chem Eng 106:171–182