Bayesian optimization of a Y branch#

Note: the cost of running the entire notebook is larger than 10 FlexCredits.

Bayesian optimization is a popular technique for searching and optimizing a design space. It works by building a probabilistic model, such as a Gaussian process, to predict the outcome of a more complex objective function. Unlike other optimizers like genetic algorithms or particle swarm optimization, Bayesian optimization uses an acquisition function to select new potential solutions, focusing on areas of the design space that the Gaussian process predicts will have high objective values and high uncertainty. Over the course of the optimization, the model’s uncertainty reduces, improving its accuracy as a surrogate for the true objective function. This makes Bayesian optimization a direct approach to optimization, particularly well-suited to problems with small (< 30 dimensions) and expensive-to-evaluate objective functions.

However, as Bayesian optimization is an iterative process with many individual components, it can be challenging to design and parallelize, leading to long run times and high computational costs. Fortunately, the Design plugin in Tidy3D includes the Bayesian optimization method MethodBayOpt which eliminates much of this complexity. Users can quickly and easily run a Bayesian optimization for complex FDTD simulations and analyze the results. Further details on Bayesian optimization and the methods used in Tidy3D are available in the open-source Python library bayesian_optimization.

This notebook details how to perform a Bayesian optimization with the Tidy3D Design plugin for the development of a Y-junction. It is based on the work of Zhengqi Gao, Zhengxing Zhang, and Duane S. Boning, "Automatic Synthesis of Broadband Silicon Photonic Devices via Bayesian Optimization" J. Light. Technol. 40, 7879-7892 (2022) DOI:10.1109/JLT.2022.3207052. A detailed description of how to build a Y-junction can be found in the Waveguide Y junction.

Schematic of the waveguide Y junction

If you are curious about other Design plugin features, see these notebooks:

[1]:

# The Bayesian optimizer uses the bayesian-optimization external package version 1.5.1.
# Uncomment the following line to install the package
# pip install bayesian-optimization==1.5.1

import gdstk
import matplotlib.pyplot as plt
import numpy as np
import tidy3d as td
import tidy3d.plugins.design as tdd
import tidy3d.web as web
from scipy.interpolate import make_interp_spline

Simulation Setup#

The simulation is defined across the 1.5 $μ m$ to 1.6 $μ m$ wavelength range with 100 sampling points. The base of the model is silicon, the top of the model is silicon dioxide; we use the material constants included in the Tidy3D Tidy3D’s material library.

[3]:

lda0 = 1.55  # central wavelength
freq0 = td.C_0 / lda0  # central frequency
n_wav = 100  # Number of wavelengths to sample in the range
ldas = np.linspace(1.5, 1.6, n_wav)  # wavelength range
freqs = td.C_0 / ldas  # frequency range
fwidth = 0.5 * (np.max(freqs) - np.min(freqs))  # width of the source frequency range

si = td.material_library["cSi"]["Palik_Lossless"]
sio2 = td.material_library["SiO2"]["Palik_Lossless"]

The following parameters describe the geometry of the waveguide. The optimizable design space is the junction which is split into 13 discrete segments.

[4]:

t = 0.22  # thickness of the silicon layer
num_d = 13  # dimensional space of the design region

l_in = 1  # input waveguide length
l_junction = 2  # length of the junction
l_bend = 6  # horizontal length of the waveguide bend
h_bend = 2  # vertical offset of the waveguide bend
l_out = 1  # output waveguide length
branch_width = 0.5  # width of one Y branch
branch_sep = 0.2  # distance between y branches at the junction
inf_eff = 100  # effective infinity

The most effective way to use the Design plugin is to split the workflow into “pre” and “post” functions which can surround a call to the Tidy3D cloud that carries out the computation. This takes advantage of automated simulation batching, allowing for parallelization which saves a considerable amount of time. The pre-function returns a Simulation; the post-function then analyzes the corresponding SimulationData. Together they can be considered the fitness function (or objective function / figure of merit) which the Bayesian optimization process is working to predict.

The function fn_pre needs to take the parameters that are being optimized: these are the 13 width segments of the junction. They always input as a dictionary and can be unpacked within the function (as below) or included as keyword arguments. These parameters are used to build a Y-junction Simulation object.

The function fn_post then takes the SimulationData object output by the simulation and computes the objective function. The value of objective is then fed back into the Bayesian optimization to inform the probabilistic model. In this case, we extract the power passing through the Y-junction and the power reflected back to the source. These values are evaluated in a custom loss function described by Gao et al. to determine the effectiveness of the junction design:

:nbsphinx-math:`begin{equation}: -frac{N_lambda}{3} sum_lambda left( R^2 + 2(T-0.5)^2 right)

end{equation}` where the summation is performed over the simulated wavelengths, $N_{λ}$ is the number of wavelength points, $R$ is the reflected power and $T$ is the transmitted power in one branch.

The aim of this function is to achieve a transmitted power of 0.5 through each branch, whilst driving the power reflected towards the source to zero. Note there is a minus sign in front of the output value as this loss function is a minimizing function, whilst all the optimizers in the Design plugin are built to maximize the objective function.

[5]:

def fn_pre(**w_params: dict) -> td.Simulation:
    """Create a Simulation of a Y splitter from a series of junction widths.

    Includes mode monitors to measure the power transmitted and reflected to source.
    """
    w_start = 0.5
    w_end = branch_width * 2 + branch_sep

    widths = [w_start]  # Ensures input waveguide is included in spline for first point of junction
    widths.extend(list(w_params.values()))
    widths.append(w_end)  # Ensures final point of junction smoothly converts to the branches

    x_junction = np.linspace(
        l_in, l_in + l_junction, num_d + 2
    )  # x coordinates of the top edge vertices
    y_junction = np.array(widths)  # y coordinates of the top edge vertices

    # pass vertices through spline and increase sampling to smooth the geometry
    new_x_junction = np.linspace(
        l_in, l_in + l_junction, 100
    )  # x coordinates of the top edge vertices
    spline = make_interp_spline(x_junction, y_junction, k=2)
    spline_yjunction = spline(new_x_junction)

    # using concatenate to include bottom edge vertices
    x_junction = np.concatenate((new_x_junction, np.flipud(new_x_junction)))
    y_junction = np.concatenate((spline_yjunction / 2, -np.flipud(spline_yjunction / 2)))

    # stacking x and y coordinates to form vertices pairs
    vertices = np.transpose(np.vstack((x_junction, y_junction)))

    junction = td.Structure(
        geometry=td.PolySlab(vertices=vertices, axis=2, slab_bounds=(0, t)), medium=si
    )

    x_start = l_in + l_junction  # x coordinate of the starting point of the waveguide bends

    x_bend = np.linspace(x_start, x_start + l_bend, 100)  # x coordinates of the top edge vertices

    y_bend = (
        (x_bend - x_start) * h_bend / l_bend
        - h_bend * np.sin(2 * np.pi * (x_bend - x_start) / l_bend) / (np.pi * 2)
        + w_end / 2
        - w_start / 2
    )  # y coordinates of the top edge vertices

    # adding the last point to include the straight waveguide at the output
    x_bend = np.append(x_bend, inf_eff)
    y_bend = np.append(y_bend, y_bend[-1])

    # add path to the cell
    cell = gdstk.Cell("bends")
    cell.add(
        gdstk.FlexPath(x_bend + 1j * y_bend, branch_width, layer=1, datatype=0)
    )  # top waveguide bend
    cell.add(
        gdstk.FlexPath(x_bend - 1j * y_bend, branch_width, layer=1, datatype=0)
    )  # bottom waveguide bend

    # define top waveguide bend structure
    wg_bend_1 = td.Structure(
        geometry=td.PolySlab.from_gds(
            cell,
            gds_layer=1,
            axis=2,
            slab_bounds=(0, t),
        )[0],
        medium=si,
    )

    # define bottom waveguide bend structure
    wg_bend_2 = td.Structure(
        geometry=td.PolySlab.from_gds(
            cell,
            gds_layer=1,
            axis=2,
            slab_bounds=(0, t),
        )[1],
        medium=si,
    )

    # straight input waveguide
    wg_in = td.Structure(
        geometry=td.Box.from_bounds(rmin=(-inf_eff, -w_start / 2, 0), rmax=(l_in, w_start / 2, t)),
        medium=si,
    )

    # the entire model is the collection of all structures defined so far
    model_structure = [wg_in, junction, wg_bend_1, wg_bend_2]

    Lx = l_in + l_junction + l_out + l_bend  # simulation domain size in x direction
    Ly = w_end + 2 * h_bend + 1.5 * lda0  # simulation domain size in y direction
    Lz = 10 * t  # simulation domain size in z direction
    sim_size = (Lx, Ly, Lz)

    # add a mode source as excitation
    mode_spec = td.ModeSpec(num_modes=1, target_neff=3.5)
    mode_source = td.ModeSource(
        center=(l_in / 2, 0, t / 2),
        size=(0, 4 * w_start, 6 * t),
        source_time=td.GaussianPulse(freq0=freq0, fwidth=fwidth),
        direction="+",
        mode_spec=mode_spec,
        mode_index=0,
    )

    # add a mode monitor to measure transmission at the output waveguide
    mode_monitor_11 = td.ModeMonitor(
        center=(l_in / 3, 0, t / 2),
        size=(0, 4 * w_start, 6 * t),
        freqs=freqs,
        mode_spec=mode_spec,
        name="mode_11",
    )

    mode_monitor_12 = td.ModeMonitor(
        center=(l_in + l_junction + l_bend + l_out / 2, w_end / 2 - w_start / 2 + h_bend, t / 2),
        size=(0, 4 * w_start, 6 * t),
        freqs=freqs,
        mode_spec=mode_spec,
        name="mode_12",
    )

    # add a field monitor to visualize field distribution at z=t/2
    field_monitor = td.FieldMonitor(
        center=(0, 0, t / 2), size=(td.inf, td.inf, 0), freqs=[freq0], name="field"
    )

    run_time = 5e-13  # simulation run time

    # construct simulation
    sim = td.Simulation(
        center=(Lx / 2, 0, 0),
        size=sim_size,
        grid_spec=td.GridSpec.auto(min_steps_per_wvl=20, wavelength=lda0),
        structures=model_structure,
        sources=[mode_source],
        monitors=[mode_monitor_11, mode_monitor_12, field_monitor],
        run_time=run_time,
        boundary_spec=td.BoundarySpec.all_sides(boundary=td.PML()),
        medium=sio2,
    )

    return sim


def fn_post(sim_data: td.SimulationData) -> float:
    """Calculate the loss function from the power at the mode monitors in the SimulationData."""
    # Calculate the power reflected back to source and transmitted to one branch
    power_reflected = np.squeeze(
        np.abs(sim_data["mode_11"].amps.sel(direction="-", mode_index=0)) ** 2
    )
    power_transmitted = np.squeeze(
        np.abs(sim_data["mode_12"].amps.sel(direction="+", mode_index=0)) ** 2
    )

    # Loss function proposed by Gao et al. which takes advantage of branch symmetry
    loss_fn = 1 / 3 * n_wav * np.sum(power_reflected**2 + 2 * (power_transmitted - 0.5) ** 2)
    output = -float(loss_fn.values)  # Negative value as this is a minimizing loss function

    return output

We can quickly check that fn_pre is working correctly by passing a set of potential test_params and plotting the result. Note that the gap between the Polyslab junction and the branches is a plotting artifact and doesn’t exist in the Simulation.

[6]:

test_params = {
    "w1": 0.5,
    "w2": 0.5,
    "w3": 0.6,
    "w4": 0.7,
    "w5": 0.9,
    "w6": 1.26,
    "w7": 1.4,
    "w8": 1.4,
    "w9": 1.4,
    "w10": 1.4,
    "w11": 1.31,
    "w12": 0.5,
    "w13": 0.5,
}
sim = fn_pre(**test_params)
sim.plot(z=0)
plt.show()

../_images/notebooks_BayesianOptimizationYJunction_11_0.png

Next, we setup the Bayesian optimization method. This is done with the MethodBayOpt object. We don’t have the lcb (lower confidence bound) acquisition function used in the paper available to us, but by making our loss function negative and using the ucb (upper confidence bound) acquisition we achieve the same optimizer design. The initial_iter and n_iter options control the number of initial random samples and subsequent optimization iterations respectively. We can also set the random seed to ensure reproducible results (optional).

We also create a list of ParameterFloat objects corresponding to the 13 segment widths. The span option defines the bounds of each parameter.

The method and parameters are then passed to a DesignSpace object which contains all we need to run the Bayesian optimization.

[12]:

method = tdd.MethodBayOpt(
    initial_iter=30,
    n_iter=70,
    acq_func="ucb",
    kappa=0.3,
    seed=1,
)

parameters = [tdd.ParameterFloat(name=f"w_{i}", span=(0.5, 1.6)) for i in range(num_d)]

design_space = tdd.DesignSpace(
    method=method, parameters=parameters, task_name="bay_opt_notebook", path_dir="./data"
)

It is then very easy to the launch this optimization with design_space.run(). This launches an initial random batch of 30 simulations, as specified by initial_iter, followed by sequential computation of 70 simulations, as specified by n_iter. In the latter phase, the Bayesian optimizer chooses potential candidates based on simultaneously maximizing the objective value and efficiently exploring the design space. The total of 100 simulations takes around 90 minutes to compute. Once complete, the results are returned in a pandas dataframe for analysis.

[14]:

results = design_space.run(fn_pre, fn_post, verbose=True)
df = results.to_dataframe()

15:18:55 EDT Running 30 Simulations

15:19:35 EDT Best Fit from Initial Solutions: -181.226

             Running 1 Simulations

15:21:22 EDT Running 1 Simulations

15:23:12 EDT Running 1 Simulations

15:24:57 EDT Latest Best Fit on Iter 2: -150.373

             Running 1 Simulations

15:26:37 EDT Latest Best Fit on Iter 3: -134.787

             Running 1 Simulations

15:28:24 EDT Latest Best Fit on Iter 4: -122.725

             Running 1 Simulations

15:30:06 EDT Running 1 Simulations

15:31:47 EDT Latest Best Fit on Iter 6: -81.428

15:31:48 EDT Running 1 Simulations

15:33:30 EDT Running 1 Simulations

15:35:15 EDT Running 1 Simulations

15:36:51 EDT Running 1 Simulations

15:39:12 EDT Running 1 Simulations

15:40:49 EDT Latest Best Fit on Iter 11: -68.332

             Running 1 Simulations

15:42:25 EDT Running 1 Simulations

15:43:55 EDT Running 1 Simulations

15:45:34 EDT Running 1 Simulations

15:47:12 EDT Latest Best Fit on Iter 15: -64.785

             Running 1 Simulations

15:49:29 EDT Latest Best Fit on Iter 16: -49.75

             Running 1 Simulations

15:51:33 EDT Running 1 Simulations

15:53:09 EDT Running 1 Simulations

15:54:08 EDT Latest Best Fit on Iter 19: -49.496

15:54:09 EDT Running 1 Simulations

15:55:41 EDT Running 1 Simulations

15:57:19 EDT Latest Best Fit on Iter 21: -35.894

             Running 1 Simulations

15:58:57 EDT Latest Best Fit on Iter 22: -30.284

15:58:58 EDT Running 1 Simulations

15:59:58 EDT Running 1 Simulations

16:01:42 EDT Latest Best Fit on Iter 24: -27.277

             Running 1 Simulations

16:02:55 EDT Latest Best Fit on Iter 25: -23.051

             Running 1 Simulations

16:03:55 EDT Running 1 Simulations

16:04:57 EDT Running 1 Simulations

16:06:52 EDT Running 1 Simulations

16:10:43 EDT Latest Best Fit on Iter 29: -22.04

             Running 1 Simulations

16:11:43 EDT Latest Best Fit on Iter 30: -18.834

             Running 1 Simulations

16:12:43 EDT Latest Best Fit on Iter 31: -17.434

             Running 1 Simulations

16:13:45 EDT Latest Best Fit on Iter 32: -13.067

16:13:46 EDT Running 1 Simulations

16:17:40 EDT Latest Best Fit on Iter 33: -10.403

             Running 1 Simulations

16:18:34 EDT Latest Best Fit on Iter 34: -7.84

             Running 1 Simulations

16:19:28 EDT Latest Best Fit on Iter 35: -5.766

             Running 1 Simulations

16:21:19 EDT Latest Best Fit on Iter 36: -5.517

             Running 1 Simulations

16:22:19 EDT Running 1 Simulations

16:23:22 EDT Running 1 Simulations

16:25:03 EDT Latest Best Fit on Iter 39: -5.402

             Running 1 Simulations

16:29:06 EDT Latest Best Fit on Iter 40: -5.051

             Running 1 Simulations

16:30:06 EDT Latest Best Fit on Iter 41: -4.96

             Running 1 Simulations

16:32:14 EDT Latest Best Fit on Iter 42: -4.307

             Running 1 Simulations

16:36:19 EDT Latest Best Fit on Iter 43: -3.882

             Running 1 Simulations

16:37:51 EDT Running 1 Simulations

16:38:55 EDT Latest Best Fit on Iter 45: -3.83

16:38:56 EDT Running 1 Simulations

16:39:55 EDT Latest Best Fit on Iter 46: -3.694

16:39:56 EDT Running 1 Simulations

16:40:55 EDT Latest Best Fit on Iter 47: -3.511

16:40:56 EDT Running 1 Simulations

16:45:17 EDT Latest Best Fit on Iter 48: -3.389

             Running 1 Simulations

16:46:59 EDT Latest Best Fit on Iter 49: -3.355

16:47:00 EDT Running 1 Simulations

16:48:13 EDT Latest Best Fit on Iter 50: -3.19

             Running 1 Simulations

16:49:45 EDT Latest Best Fit on Iter 51: -3.061

             Running 1 Simulations

16:50:47 EDT Running 1 Simulations

16:52:21 EDT Latest Best Fit on Iter 53: -2.884

16:52:22 EDT Running 1 Simulations

16:53:54 EDT Running 1 Simulations

16:54:49 EDT Latest Best Fit on Iter 55: -2.881

             Running 1 Simulations

16:55:50 EDT Running 1 Simulations

16:56:50 EDT Running 1 Simulations

16:57:50 EDT Running 1 Simulations

16:59:37 EDT Latest Best Fit on Iter 59: -2.807

             Running 1 Simulations

17:00:37 EDT Running 1 Simulations

17:02:19 EDT Latest Best Fit on Iter 61: -2.787

             Running 1 Simulations

17:03:49 EDT Running 1 Simulations

17:05:29 EDT Running 1 Simulations

17:07:14 EDT Running 1 Simulations

17:09:02 EDT Running 1 Simulations

17:10:04 EDT Running 1 Simulations

17:11:45 EDT Running 1 Simulations

17:13:26 EDT Running 1 Simulations

17:14:35 EDT Latest Best Fit on Iter 69: -2.77

             Best Result: -2.770382176989957
             Best Parameters: w_0: 0.5309302341102854 w_1: 0.5 w_10:
             1.036137990749239 w_11: 0.8488755244945626 w_12: 1.6 w_2: 1.6 w_3:
             1.6 w_4: 1.6 w_5: 1.4349135306156366 w_6: 1.471742406951237 w_7:
             1.2941922008374072 w_8: 1.6 w_9: 1.130824837197553

Results#

The best result can be extracted directly from the optimizer object. Plotting this, we see what the optimizer has returned as the optimized structure for this design of Y-junction.

[15]:

best_params = results.optimizer.max["params"]
print(f"Best fitness: {results.optimizer.max['target']}")
sim = fn_pre(**best_params)
sim.plot(z=0)
plt.show()

Best fitness: -2.770382176989957

../_images/notebooks_BayesianOptimizationYJunction_18_1.png

We can then create a plot to evaluate if the Bayesian optimization has converged on a fitness value. The first 30 iterations are from the random initialization, so the fitness values are expected to be more varied. The fitness has converged before the end of the remaining 70 iterations; an early stop criteria could have been included to finish the optimization sooner.

[17]:

ax = df["output"].plot(xlabel="Simulation Number", ylabel="Fitness")

../_images/notebooks_BayesianOptimizationYJunction_20_0.png

The mean power at the reflected and transmitted monitors can be calculated and compared to the paper from Gao et al.

[18]:

idx_best_result = df["output"].idxmax()
best_sim_filename = results.task_paths[idx_best_result]
best_sim = td.SimulationData.from_file(best_sim_filename)

power_reflected = np.array(
    np.squeeze(np.abs(best_sim["mode_11"].amps.sel(direction="-", mode_index=0)) ** 2)
).mean()
power_transmitted = np.array(
    np.squeeze(np.abs(best_sim["mode_12"].amps.sel(direction="+", mode_index=0)) ** 2)
).mean()

print(f"Mean Reflected Power: {round(power_reflected, 3)} (Paper: 0.004)")
print(f"Mean Transmitted Power: {round(power_transmitted, 3)} (Paper: 0.460)")

Mean Reflected Power: 0.002 (Paper: 0.004)
Mean Transmitted Power: 0.48 (Paper: 0.460)

Final Optimized Design#

Finally, we can simulate the optimal solution with an additional FieldMonitor to visualise the field distribution throughout the splitter.

[19]:

final_sim = fn_pre(**best_params)

# Define a field monitor to help visualize the field distribution
field_monitor = td.FieldMonitor(
    center=(0, 0, 0), size=(td.inf, td.inf, 0), freqs=[freq0], name="field"
)

mode_12 = final_sim.monitors[1]

final_sim = final_sim.copy(update={"monitors": (field_monitor, mode_12)})

final_sim_data = web.run(final_sim, task_name="BO_notebook_final_sim")

17:45:45 EDT Created task 'BO_notebook_final_sim' with task_id
             'fdve-0e2fa139-898e-4bcd-9471-943d668ba333' and task_type 'FDTD'.

             View task using web UI at
             'https://tidy3d.simulation.cloud/workbench?taskId=fdve-0e2fa139-898
             e-4bcd-9471-943d668ba333'.

             Task folder: 'default'.

17:45:47 EDT Maximum FlexCredit cost: 0.111. Minimum cost depends on task
             execution details. Use 'web.real_cost(task_id)' to get the billed
             FlexCredit cost after a simulation run.

             status = queued

             To cancel the simulation, use 'web.abort(task_id)' or
             'web.delete(task_id)' or abort/delete the task in the web UI.
             Terminating the Python script will not stop the job running on the
             cloud.

17:46:02 EDT starting up solver

17:46:03 EDT running solver

17:46:21 EDT early shutoff detected at 76%, exiting.

             status = postprocess

17:46:25 EDT status = success

17:46:27 EDT View simulation result at
             'https://tidy3d.simulation.cloud/workbench?taskId=fdve-0e2fa139-898
             e-4bcd-9471-943d668ba333'.

17:46:29 EDT loading simulation from simulation_data.hdf5

Plotting the field shows how it splits evenly between each branch.

[20]:

final_sim_data.plot_field("field", "E", "abs^2")
plt.show()

../_images/notebooks_BayesianOptimizationYJunction_27_0.png

And we can visualise how the power transmitted varies over the frequency range.

[21]:

power_transmitted = np.squeeze(
    np.abs(final_sim_data["mode_12"].amps.sel(direction="+", mode_index=0)) ** 2
)
plt.plot(freqs, power_transmitted)
plt.title("Power transmitted across frequency range")
plt.xlabel("Frequency / Hz")
plt.ylabel("Power")
plt.show()

../_images/notebooks_BayesianOptimizationYJunction_29_0.png

Conclusion#

Through comparison of the junction geometry and the transmitted and reflected power, we can show that these results closely follow the results published by Gao et al. for the design of a Y-junction. This notebook demonstrates how to carry out Bayesian optimization with the Design plugin, and can be readily adapted to other use cases.

Learn more about the Design plugin:

Bayesian optimization of a Y branch

Contents

Bayesian optimization of a Y branch#

Simulation Setup#

Results#

Final Optimized Design#

Conclusion#