Resolving High CPU Consumption Due to Database Queries in Optimizely CMS

Have you ever encountered a scenario where your database server’s CPU consumption spiked to 100%, causing major disruptions? In this blog post, we’ll delve into a real-life case where a client experienced precisely that, and how we identified and resolved the root cause.

The Problem Unveiled

Our client, let’s call them ClientX, was facing intermittent spikes in CPU usage on their database server. These spikes occurred primarily during certain days of the week, lasting up to two hours each time. Upon investigating the logs, a particular SQL script caught our attention. This script, which should have been running infrequently, was unexpectedly executed around 5 million times during these CPU-intensive periods. Here’s the culprit:

SELECT MI.pkID AS ContentId, MI.Provider, MI.ProviderUniqueId, MI.ContentGuid, MI.ExistingContentId, MI.ExistingCustomProvider, MI.Metadata, MI.Saved
	FROM tblMappedIdentity AS MI 
	INNER JOIN @InternalIds AS EI ON (MI.pkID = EI.ID AND MI.Provider = EI.Provider)
	UNION (SELECT MI2.pkID AS ContentId, MI2.Provider, MI2.ProviderUniqueId, MI2.ContentGuid, MI2.ExistingContentId, MI2.ExistingCustomProvider, MI2.Metadata, MI2.Saved
		FROM tblMappedIdentity AS MI2
		INNER JOIN @InternalIds AS EI2 ON (MI2.ExistingContentId = EI2.ID)
		WHERE ((MI2.ExistingCustomProvider = 1 AND MI2.Provider = EI2.Provider) OR (MI2.ExistingCustomProvider IS NULL AND EI2.Provider IS NULL)))

This script was being invoked by the stored procedure netMappedIdentityGetById, a component utilized by external content providers to integrate external data sources into the Optimizely CMS Site.

Identifying the Cause

Upon closer inspection, we identified a specific line of code that triggered the execution of the mentioned SQL script:

var mappedItem = _identityMappingService.Get(contentLink);

This line was fetching data from the database using the netMappedIdentityGetById stored procedure. However, if an exception occurred during the execution of the LoadContent method in the Content Provider implementation, it invalidated the corresponding cache entry. Consequently, subsequent mapping requests necessitated fresh database queries, leading to excessive CPU consumption.

Resolving the Issue

To address this issue effectively, we implemented the following resolutions:

  1. Review Logs: Thoroughly examining the logs helped us identify errors in the Content Provider instances, particularly messages like “Failed to Read cacheKey.” This enabled us to understand why certain items were not being cached properly.
  2. Introduce Caching: We integrated a custom cache mechanism before each invocation of _identityMappingService.Get(contentLink). This approach significantly reduced unnecessary database retrievals, especially in scenarios where errors occurred in Content Provider implementations.
  3. Bulk Mapping: Whenever feasible, we opted for IdentityMappingService.List instead of IdentityMappingService.Get. By fetching mappings in bulk, we minimized database roundtrips, thus alleviating the strain on the SQL server.

One thought on “Resolving High CPU Consumption Due to Database Queries in Optimizely CMS

  1. Johan Book

    Thanks for talking about this topic! We see this error frequently in many projects, but we don’t have any own calls to IdentityMappingService. I wonder if there are perhaps issues internally in Optimizely with this?

    We see it happening in a Commerce scenario, for instance when you have a link (content reference prop in a menu) or a variation that is referenced from a promotion and you then remove the variation. Every time the promotion is evaluated, we can see database calls where it tries to resolve this orphan reference. And it’s not like it’s gonna self-heal and come back, so it seems something is broken in the internal logic of Optimizely.

    Can’t help but wondering what is the expected behavior for content that is not found? Should it be cached as a null value to avoid subsequent database lookups?

    Like

Leave a comment