Skip to main content
Skip to main content

The Too Many Parts Problem

This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Need more performance optimization tips? Check out the Performance Optimization community insights guide.

Universal pain point: Small frequent inserts create performance degradation through part explosion.

Clear warnings from ClickHouse engineers and users:

  • "If you are doing 100,000 inserts it is going to create 100,000 parts, and behind the scenes, over time, ClickHouse is going to merge those parts... the more parts you create, the more meta work that you create for ClickHouse to merge"
  • "If the inserts are too small then you have another problem in that these small parts will not get merged together in time in the background. When too many parts accumulate, you'll get the "too many parts" error which will slow down your writing throughput"
  • "We encourage our users to insert data in batches. For example, 20,000 rows at once"

Recognize the Problem Early

Proper Insert Batching

Community-proven batching strategy from production deployments:

"All these batching services they buffer the traffic... either for 30 seconds or till the data size hits 200 MB... there is a fine balance that needs to be struck here for how much to buffer... we don't want to insert again too frequently because then it unnecessarily consumes click house cycles" - Production observability platform processing TB/day

Alternative: Async Inserts (ClickHouse 21.11+)

"We developed a function called async insert... this mechanism is straightforward. It's similar to buffer table. We insert to the server side and use a buffer to collect these inserts. By default we have 16 threads to collect this buffer and if the buffer is large enough, or we reach a timeout, we will flush the buffer to the storage so a part will contain multiple inserts" - ClickHouse team explaining the built-in solution