Constraints

One of the features of our benchmark is the support of feature constraints, in the dataset definition and in the attacks.

There three 3 types of constraints:

Boundary constraints: These constraints are defined in meta_data of dataset (for example the csv) and define the maximum and minimum allowed values for each feature.
Mutability constraints: These constraints are also defined in meta_data of dataset (for example the csv) and indicate which features can

Manual definition of feature relations

all classes below are defined in tabularbench.constraints.relation_constraint.

Constraints between features can be expressed in natural language. For example, we express the constraint F0 = F1 + F2 such as:

from tabularbench.constraints.relation_constraint import Feature
constraint1 = Feature(0) == Feature(1) + Feature(2)

Accessing a feature

A feature can be accessed by its index (0, 1, …) or by its name (installment,loan_amnt, …).

from tabularbench.constraints.relation_constraint import Feature
constraint1 = Feature(0) == Feature(1) + Feature(2)
constraint2 = Feature("open_acc") <= Feature("total_acc")

Pre-builts

Base operators: Pre-built operators include equalities, inequalities, math operations (+,-,*,/, …) custom operators can be built by extending the class MathOperation:
Safe operators: SafeDivision and Log allow a fallback value if the denominator is 0.
Constraints operators: All constraints operators extend BaseRelationConstraint: OrConstraint, AndConstraint, LessConstraint, LessEqualConstraint
Tolerance-aware constraint operators: EqualConstraint allows a tolerance value in assessing the equality.. == can also be used for no tolerance equalities.

Loading existing definitions

You can build your own constrained dataset by removing or adding constraints to an existing one. In LCLD dataset, the term can only be 36 or 60 months, this is the constraint at index 3. We can replace it with a new set of constraints as follows (It is recommended to extend the whole class for a different set of constraints).

from tabularbench.constraints.relation_constraint import Feature, Constant
from tabularbench.datasets.samples.lcld import get_relation_constraints

lcld_constraints =  get_relation_constraints()

new_constraint = (Feature("term") == Constant(36)) | (Feature("term") == Constant(48)) | (Feature("term") == Constant(60))
lcld_constraints[3] = new_constraint

Constraint evaluation

Given a dataset, one can check the constraint satisfaction over all constraints, given a tolerance.

from tabularbench.constraints.constraints_checker import ConstraintChecker
from tabularbench.datasets import dataset_factory

tolerance=0.001
dataset = dataset_factory.get_dataset("url")
x, _ = dataset.get_x_y()

constraints_checker = ConstraintChecker(
    dataset.get_constraints(), tolerance
)
out = constraints_checker.check_constraints(x.to_numpy())

Constraint repair

In the provided datasets, all constraints are satisfied. During the attack, Constraints can be fixed as follows:

import numpy as np
from tabularbench.constraints.constraints_fixer import ConstraintsFixer
from tabularbench.constraints.relation_constraint import Feature

x = np.arange(9).reshape(3, 3)
constraint = Feature(0) == Feature(1) + Feature(2)

constraints_fixer = ConstraintsFixer(
            guard_constraints=[constraint],
            fix_constraints=[constraint],
        )

x_fixed = constraints_fixer.fix(x)

x_expected = np.array([[3, 1, 2], [9, 4, 5], [15, 7, 8]])

assert np.equal(x_fixed, x_expected).all()

Constraint violations can be translated into losses and you can compute the gradient to repair the faulty constraints as follows:

import torch

from tabularbench.constraints.constraints_backend_executor import (
    ConstraintsExecutor,
)

from tabularbench.constraints.pytorch_backend import PytorchBackend
from tabularbench.datasets.dataset_factory import get_dataset

ds = get_dataset("url")
constraints = ds.get_constraints()
constraint1 = constraints.relation_constraints[0]

x, y = ds.get_x_y()
x_metadata = ds.get_metadata(only_x=True)
x = torch.tensor(x.values, dtype=torch.float32)

constraints_executor = ConstraintsExecutor(
    constraint1,
    PytorchBackend(),
    feature_names=x_metadata["feature"].to_list(),
)
        
x.requires_grad = True
loss = constraints_executor.execute(x)
grad = torch.autograd.grad(
    loss.sum(),
    x,
)[0]