Commit 6ec49722 authored by craig[bot]'s avatar craig[bot]

Merge #42274

42274: backup: randomize order that spans are backed up r=dt a=dt

Prior to this change, sequential ranges were sent ExportRequests
sequentially during BACKUP. If adjacent ranges had co-located
leaseholders (which could easily happen in some cases), then the entire
quota of concurrent outstanding requests could be filled by requests for
one node or more generally some small subset of nodes (e.g. a
geo-partitioned table would send all the backup requests to one region,
then to another, etc), leaving the rest of the cluster under-utilized.

This change randomizes the order in which ranges are sent
ExportRequests, ideally evening out utilization across all nodes that
host ranges for the table(s) being backed up even if leaseholders for
adjacent ranges are clustered.

Release note (performance improvement): Spread BACKUP work more evenly across clusters that have non-uniform leaseholder distributions.
Co-authored-by: default avatarDavid Taylor <[email protected]>
parents df74d2a1 e46b76cc
Pipeline #117 failed with stages
in 1 minute and 13 seconds
......@@ -13,6 +13,7 @@ import (
"context"
"fmt"
"io/ioutil"
"math/rand"
"net/url"
"sort"
"time"
......@@ -735,6 +736,18 @@ func backup(
allSpans = append(allSpans, spanAndTime{span: s, start: backupDesc.StartTime, end: backupDesc.EndTime})
}
// Sequential ranges may have clustered leaseholders, for example a
// geo-partitioned table likely has all the leaseholders for some contiguous
// span of the table (i.e. a partition) pinned to just the nodes in a region.
// In such cases, sending spans sequentially may under-utilize the rest of the
// cluster given that we have a limit on the number of spans we send out at
// a given time. Randomizing the order of spans should help ensure a more even
// distribution of work across the cluster regardless of how leaseholders may
// or may not be clustered.
rand.Shuffle(len(allSpans), func(i, j int) {
allSpans[i], allSpans[j] = allSpans[j], allSpans[i]
})
progressLogger := jobs.NewChunkProgressLogger(job, len(spans), job.FractionCompleted(), jobs.ProgressUpdateOnly)
// We're already limiting these on the server-side, but sending all the
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment