Optimization of an S-bend with CMA-ES

Optimization of an S-bend with CMA-ES#

Note: the cost of running the entire notebook is larger than 1 FlexCredit.

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a robust evolutionary algorithm used for solving optimization problems in continuous domains. It stands out for its ability to adaptively adjust the covariance matrix, which guides the generation of new solution candidates, enabling it to efficiently navigate complex, multimodal landscapes. Its self-adaptive mechanism and population-based approach make it a powerful tool for a wide range of applications.

The steps in a CMA-ES are typically as follows:

  1. Initialization: Generate an initial population of candidate solutions randomly within the search space.

  2. Initialize the parameters: mean of the search distribution, covariance matrix, step size, etc.

  3. Evaluation: Assess each candidate in the current population using the objective function to determine its fitness.

  4. Selection: Select a subset of the best-performing candidates based on their fitness scores.

  5. Update Mean: Calculate the new mean of the selected candidates, which will be used to sample the next generation.

  6. Adapt Covariance Matrix: Update the covariance matrix based on the distribution of the selected candidates. This step is crucial as it adapts the sampling strategy to the shape of the objective function’s landscape.

  7. Adapt Step Size: Adjust the step size, which controls the global scale of the search. This is typically done using a mechanism like path length control, which helps in maintaining an appropriate exploration-exploitation balance.

  8. Generate New Population: Sample a new population of candidates based on the updated mean, covariance matrix, and step size.

  9. Termination Check: Check for termination criteria (e.g., maximum number of iterations, satisfactory solution quality, or stagnation in improvement).

  10. Repeat: If termination criteria are not met, return to step 2 with the new population and continue the process.

  11. Output: Once termination criteria are met, output the best solution found or the final population depending on the specific requirements of the problem.

CMA-ES can be effectively applied to the design optimization of photonic components. Yuto Miyatake, Kasidit Toprasertpong, Shinichi Takagi, and Mitsuru Takenaka, "Design of compact and low-loss S-bends by CMA-ES," Opt. Express 31, 43850-43863 (2023) DOI:10.1364/OE.504866 demonstrates the optimization of a compact, low-loss waveguide S-bend using CMA-ES. In this notebook, we follow this work and optimize a silicon waveguide S-bend. We use the CMA-ES optimizer from the open-source package pycma so we can set up the optimization very quickly.

Schematic of the S-bend

Tidy3D is a powerful tool for photonic design optimization due to its fast speed and high throughput. Besides CMA-ES, we have demonstrated particle swarm optimizations of a polarization beam splitter and a bullseye cavity, genetic algorithm optimization of an on-chip reflector, and direct binary search optimization of an optical switch. Furthermore, we also have a growing list of gradient-based adjoint optimization examples including

  1. Mode converter,

  2. Waveguide taper,

  3. Metalens,

  4. Waveguide bend,

  5. Multiplexer,

  6. Y-branch,

  7. MZI circuit,

  8. Light extractor.

[1]:
import numpy as np
import matplotlib.pyplot as plt
import math
import cma

import tidy3d as td
import tidy3d.web as web

Simulation Setup#

The simulation wavelength range is 1500 nm to 1600 nm.

[2]:
lda0 = 1.55  # central wavelength
freq0 = td.C_0 / lda0  # central frequency

n_freqs = 11
ldas = np.linspace(1.5, 1.6, n_freqs)  # wavelength range
freqs = td.C_0 / ldas  # frequency range
fwidth = 0.5 * (np.max(freqs) - np.min(freqs))  # width of the source frequency range

For simplicity, we will use the silicon and oxide media directly from the material library.

[3]:
si = td.material_library["cSi"]["Palik_Lossless"]

sio2 = td.material_library["SiO2"]["Palik_Lossless"]

The input and output straight waveguides have a 430 nm width and 220 nm thickness. The S-bend has a size of 3.5 µm and its boundaries are characterized by a Bezier curve with 7 control points. Therefore, the 14 parameters corresponding to the \(xy\) coordinates of the control points tune the shape and width of the bend. The first and last points are fixed to ensure a smooth connection to the straight waveguides. The y coordinate of the middle control point is also fixed. Therefore, we have a total of 9 tunable parameters that can be optimized.

[4]:
s = 3.5  # s-bend size
w = 0.43  # straight waveguide width
t = 0.22  # waveguide thickness
buffer = 1.5  # buffer spacing
inf_eff = 1e2  # effective infinity

# initial values of the tunable parameters corresponding to the coordinates of the control points
x1 = 0.7675
x2 = 0.215
x3 = 0.215
x4 = 0.215
x5 = -0.9825
y1 = 1.535
y2 = 0.7675
y4 = -0.9825
y5 = -1.965

# initial parameter vector
x0 = [x1, x2, x3, x4, x5, y1, y2, y4, y5]

We define a function bezier_curve that takes the coordinates of the control points and returns the Bezier curve. To test it, we generate the Bezier curve that describes the bottom curve of the S-bend with the initial control points defined earlier and plot them.

[5]:
def bezier_curve(control_points, num_points=100):
    t = np.linspace(0, 1, num_points)
    curve = np.zeros((num_points, 2))
    n = len(control_points) - 1  # degree of the Bézier curve

    for i in range(n + 1):
        binomial_coeff = math.factorial(n) / (math.factorial(i) * math.factorial(n - i))
        term = binomial_coeff * (1 - t) ** (n - i) * t**i
        curve += np.outer(term, control_points[i])

    return curve


# define 7 control points
control_points = np.array(
    [
        [-s / 2, -s / 2 - w / 2],
        [x5, y5],
        [x4, y4],
        [x3, 0],
        [x2, y2],
        [x1, y1],
        [s / 2, s / 2 - w / 2],
    ]
)

# generate Bézier curve
curve_bottom = bezier_curve(control_points)

plt.plot(curve_bottom[:, 0], curve_bottom[:, 1], label="Bezier Curve")
plt.scatter(control_points[:, 0], control_points[:, 1], color="red", label="Control Points")
plt.legend()
plt.show()
../_images/notebooks_SbendCMAES_11_0.png

The top curve of the S-bend is simply a 180 degree rotation of the bottom curve. With both curves, the bend can be simply defined as a PolySlab. The input and output straight waveguides can be defined using Box.

[6]:
vertices = np.vstack((curve_bottom, -curve_bottom))

sbend = td.Structure(
    geometry=td.PolySlab(vertices=vertices, axis=2, slab_bounds=(-t / 2, t / 2)), medium=si
)

waveguide_in = td.Structure(
    geometry=td.Box.from_bounds(
        rmin=(-inf_eff, -s / 2 - w / 2, -t / 2), rmax=(-s / 2, -s / 2 + w / 2, t / 2)
    ),
    medium=si,
)

waveguide_out = td.Structure(
    geometry=td.Box.from_bounds(
        rmin=(s / 2, s / 2 - w / 2, -t / 2), rmax=(inf_eff, s / 2 + w / 2, t / 2)
    ),
    medium=si,
)

sbend.plot(z=0)
plt.show()
../_images/notebooks_SbendCMAES_13_0.png

We will add a ModeSource to launch the TE0 mode at the input waveguide and add a ModeMonitor at the output waveguide to measure transmission efficiency.

[7]:
# add a mode source as excitation
mode_spec = td.ModeSpec(num_modes=1, target_neff=3.5)
mode_source = td.ModeSource(
    center=(-s / 2 - buffer / 2, -s / 2, 0),
    size=(0, 4 * w, 6 * t),
    source_time=td.GaussianPulse(freq0=freq0, fwidth=fwidth),
    direction="+",
    mode_spec=mode_spec,
    mode_index=0,
)

# add a mode monitor to measure transmission at the output waveguide
mode_monitor = td.ModeMonitor(
    center=(s / 2 + buffer / 2, s / 2, 0),
    size=mode_source.size,
    freqs=freqs,
    mode_spec=mode_spec,
    name="mode",
)

To facilitate the optimization, we define a function make_sim that takes the design parameter vector and returns a Simulation.

[8]:
def make_sim(x):
    control_points = np.array(
        [
            [-s / 2, -s / 2 - w / 2],
            [x[4], x[8]],
            [x[3], x[7]],
            [x[2], 0],
            [x[1], x[6]],
            [x[0], x[5]],
            [s / 2, s / 2 - w / 2],
        ]
    )

    curve_bottom = bezier_curve(control_points)

    vertices = np.vstack((curve_bottom, -curve_bottom))

    sbend = td.Structure(
        geometry=td.PolySlab(vertices=vertices, axis=2, slab_bounds=(-t / 2, t / 2)), medium=si
    )

    run_time = 5e-13  # simulation run time

    return td.Simulation(
        size=(s + 2 * buffer, s + 2 * buffer, 10 * t),
        grid_spec=td.GridSpec.auto(min_steps_per_wvl=15, wavelength=lda0),
        structures=[waveguide_in, waveguide_out, sbend],
        sources=[mode_source],
        monitors=[mode_monitor],
        run_time=run_time,
        boundary_spec=td.BoundarySpec.all_sides(boundary=td.PML()),
        medium=sio2,
        symmetry=(0, 0, 1),
    )

Optimization Setup#

The objective of the optimization is to reduce the loss at the S-bend. Therefore, the objective function is simply the transmission to the output waveguide. By default, the objective evaluation is sequential, i.e. in each generation the objective function is calculated for each population one at a time. To leverage the parallel run of Tidy3D, we want to evaluate the objective functions for all populations in the generation at once. To achieve this, we need to do some customization for the optimizer, which we will demonstrate later. For now, let’s assume that the objective function takes a 2D array of parameter vectors instead of a single parameter vector and returns a 1D array of objective function values. Each row of the 2D array corresponds to one parameter vector in the generation and the number of rows equals the number of populations.

[9]:
def compute_transmission(sim_data):
    amps = sim_data["mode"].amps.sel(mode_index=0, direction="+").values
    return np.abs(amps) ** 2


def objective_function(xs):
    sims = {}
    for i, x in enumerate(xs):
        sim = make_sim(x)
        sims[f"population_{i}"] = sim

    batch = web.Batch(simulations=sims, verbose=False)
    batch_results = batch.run(path_dir="data")

    T = []
    for i, _ in enumerate(xs):
        sim_data = batch_results[f"population_{i}"]
        T.append(-10 * np.log10(np.mean(compute_transmission(sim_data))))
    return T

Define the hyperparameters for the optimization. We use a population size of 10 and run the optimization for a total of 12 generations. The initial standard deviation is set to 0.05, which is empirically determined. This parameter determines the size of the explored parameter space initially. Too large of an initial standard deviation could lead to a self-intersecting PolySlab. For reproducibility, we also use a fixed random seed.

With the hyperparameters, we define the CMA-ES optimizer.

[10]:
dim = len(x0)
population_size = 10
max_generations = 12
sigma0 = 0.05
random_seed = 12345

es = cma.CMAEvolutionStrategy(
    x0, sigma0, {"popsize": population_size, "maxiter": max_generations, "seed": random_seed}
)
(5_w,10)-aCMA-ES (mu_w=3.2,w_1=45%) in dimension 9 (seed=12345, Wed Dec 20 17:40:44 2023)

Run Optimization#

With the objective function and optimizer set up, we can start the optimization. As we mentioned previously, we need to customize the optimization loop slightly such that the objective functions for the entire population can be evaluated in parallel.

We also keep track of the best objective function value and display it at the end of each generation. We can see that the transmission improved from 0.271 dB in the first generation to 0.045 dB in the last generation, which is a significant improvement.

[11]:
best_values = []

generation = 0
while not es.stop():
    solutions = es.ask()
    function_values = objective_function(solutions)
    es.tell(solutions, function_values)

    best_values.append(es.result[1])

    # Print message for each generation
    print(f"Generation {generation}: Best function value = {es.result[1]:.3f}")

    generation += 1
Generation 0: Best function value = 0.271
Generation 1: Best function value = 0.235
Generation 2: Best function value = 0.201
Generation 3: Best function value = 0.164
Generation 4: Best function value = 0.079
Generation 5: Best function value = 0.059
Generation 6: Best function value = 0.059
Generation 7: Best function value = 0.059
Generation 8: Best function value = 0.058
Generation 9: Best function value = 0.045
Generation 10: Best function value = 0.045
Generation 11: Best function value = 0.045

Plot the best objective value as a function of generation.

[12]:
plt.plot(best_values)
plt.xlabel("Generation")
plt.ylabel("Best objective function")
plt.show()
../_images/notebooks_SbendCMAES_27_0.png

Final Design#

After the optimization, we can inspect the optimized design more closely. First, extract the optimal parameter vector from the optimizer. Then define a Simulation and visualize it.

[13]:
x_opt = es.result[0]
sim_opt = make_sim(x_opt)
sim_opt.plot_3d()

To help visualize the field propagation, we add a FieldMonitor to the simulation and run it again.

[14]:
# add a field monitor to visualize field distribution at z=0
field_monitor = td.FieldMonitor(
    center=(0, 0, 0), size=(td.inf, td.inf, 0), freqs=[freq0], name="field"
)

sim_opt = sim_opt.copy(update={"monitors": [mode_monitor, field_monitor]})
sim_data_opt = web.run(simulation=sim_opt, task_name="final_design")
17:52:26 Eastern Standard Time Created task 'final_design' with task_id
                               'fdve-92d29705-e9f2-485b-9d0a-2d7c06424d42' and
                               task_type 'FDTD'.
17:52:27 Eastern Standard Time status = queued
17:52:31 Eastern Standard Time status = preprocess
17:52:35 Eastern Standard Time Maximum FlexCredit cost: 0.025. Use
                               'web.real_cost(task_id)' to get the billed
                               FlexCredit cost after a simulation run.
                               starting up solver
                               running solver
                               To cancel the simulation, use
                               'web.abort(task_id)' or 'web.delete(task_id)' or
                               abort/delete the task in the web UI. Terminating
                               the Python script will not stop the job running
                               on the cloud.
17:52:46 Eastern Standard Time early shutoff detected at 64%, exiting.
                               status = postprocess
17:52:49 Eastern Standard Time status = success
17:52:51 Eastern Standard Time loading simulation from simulation_data.hdf5

After the simulation is complete, plot the transmission spectrum, which shows a very low loss.

[15]:
T_opt = 10 * np.log10(compute_transmission(sim_data_opt))
plt.plot(ldas * 1e3, T_opt, "red", linewidth=2)
plt.ylim(-0.5, 0)
plt.ylabel("Transmission (dB)")
plt.xlabel("Wavelength (nm)")
plt.show()
../_images/notebooks_SbendCMAES_34_0.png

Lastly, plot the field intensity distribution to visualize the energy flow. The optimized design allows a smooth mode transition at the two bends such that the loss is minimal.

[16]:
sim_data_opt.plot_field(field_monitor_name="field", field_name="E", val="abs^2")
plt.show()
../_images/notebooks_SbendCMAES_36_0.png

With the optimized design, we can directly export a GDS file of the S-bend for fabrication.

[17]:
sim_opt.to_gds_file(fname="misc/optimized_sbend.gds", z=0)
[ ]: