Benchmark Run Summary

Run Type¹	Commit SHA	Time of Commit	Hardware	Languages	Benchmark Type	Number of Benchmarks
baseline	2dcee3f	2023-10-19 09:12:19	Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz	Python, R	macrobenchmarks	194
contender	a61f4af	2024-01-16 14:38:51	Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz	Python, R	macrobenchmarks	193
baseline	2dcee3f	2023-10-19 09:12:19	AMD Ryzen 5 PRO 4650GE with Radeon Graphics	C++, Java	microbenchmarks	3368
contender	a61f4af	2024-01-16 14:38:51	AMD Ryzen 5 PRO 4650GE with Radeon Graphics	C++, Java	microbenchmarks	3386
¹ When we compare benchmark results, we always have a contender (the new code that we are considering) and a baseline (the old code that were are comparing to). The historic distribution will be drawn from all benchmark results on commits in the baseline commit's git ancestry, up to and including all runs on the baseline commit itself. In this context, a baseline is typically the last Arrow release and the contender is the current release candidate.

Macrobenchmarks

Live Conbench UI views for the macrobenchmarks are available at this url. Conbench is an additional method to explore the results of the benchmarks particularly if you want to see results from more of the history or see more metadata.

Benchmark Percent Changes

Benchmarks are plotted using the percent change from baseline to contender.
Additional information on each benchmark is available by hovering over the relevant bar.

Python

dataframe-to-table

dataset-filter

dataset-read

dataset-select

dataset-selectivity

file-read

file-write

recursive-get-file-info

wide-dataframe

R

dataframe-to-table

file-read

file-write

partitioned-dataset-filter

tpch

Microbenchmarks

There are currently 3352 microbenchmarks in the Arrow benchmarks. The following comparisons are also available to be viewed in the Conbench UI.

Language	z-score threshold	Number of microbenchmarks
Language	z-score threshold	Stable	Improvements	Regressions	Total
C++	5	2733	321	263	3317
Java	5	26	5	4	35

Because of the large number of benchmarks, the top 20 benchmark results that deviate most from the baseline in both the positive and negative directions are presented below. All microbenchmark results for this comparison can be explored interactively in the microbenchmark explorer.

Largest 20 regressions between baseline and contender

	Benchmark	Params	Analysis		Results
	Benchmark	Params	z-score	Percent Change	Baseline result	Contender result	unit
arrow-acero-aggregate-benchmark
C++	VarianceKernelInt32	1048576/1	−25.92	−3,423%	287,200	188,900	MB/s¹
arrow-acero-expression-benchmark
C++	ExecuteScalarExpressionBaseline	<ComplexExpressionBaseline>/rows_per_batch:10000/real_time/threads:1	−29.06	−830%	1,419,000	1,537,000	ns¹
C++	ExecuteScalarExpressionBaseline	<ComplexExpressionBaseline>/rows_per_batch:100000/real_time/threads:1	−25.43	−788%	1,412,000	1,523,000	ns¹
arrow-compute-function-benchmark
C++	BM_ExecSpanIterator	1024	−35.50	−2,268%	12,930	9,998	i/s¹
C++	BM_ExecSpanIterator	16384	−36.12	−1,964%	172,800	138,900	i/s¹
C++	BM_ExecSpanIterator	4096	−42.24	−2,244%	49,840	38,650	i/s¹
arrow-compute-vector-hash-benchmark
C++	UniqueString10bytes	0	−26.86	−1,474%	837	714	MB/s¹
arrow-compute-vector-selection-benchmark
C++	FilterStringFilterNoNulls	524288/3	−50.81	−3,739%	3,339	2,090	MB/s¹
arrow-compute-vector-sort-benchmark
C++	ArraySortIndicesBool	32768/10	−25.61	−1,303%	42	37	MB/s¹
arrow-io-file-benchmark
C++	FileOutputStreamSmallWritesToNull	real_time	−26.24	−1,275%	245	214	MB/s¹
arrow-small-vector-benchmark
C++	CopyShortVector	<SMALL_VECTOR(std::string)>	−248.30	−1,617%	126,400,000	106,000,000	i/s¹
C++	ShortVectorInsert	<SMALL_VECTOR(int)>	−121.30	−615%	875,300,000	821,400,000	i/s¹
arrow-value-parsing-benchmark
C++	IntegerFormatting	<UInt16Type>	−29.58	−449%	162,600,000	155,300,000	i/s¹
parquet-encoding-benchmark
C++	BM_DeltaBitPackingEncode_Int32_Wide	4096	−47.59	−1,593%	523	439	MB/s¹
C++	BM_PlainDecodingDouble	65536	−67.53	−3,649%	38,020	24,150	MB/s¹
C++	BM_PlainDecodingInt64	65536	−68.45	−3,652%	38,000	24,130	MB/s¹
C++	BM_PlainEncodingDoubleNaN	65536	−52.23	−3,482%	35,680	23,260	MB/s¹
C++	BM_PlainEncodingDouble	65536	−53.58	−3,501%	35,790	23,260	MB/s¹
C++	BM_PlainEncodingInt64	65536	−58.99	−3,494%	35,740	23,250	MB/s¹
arrow.memory.ArrowBufBenchmarks
Java	setZero	source=java-micro, suite=arrow.memory.ArrowBufBenchmarks	−57.60	−4,768%	47,800	25,010	i/s¹
¹ MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second

Largest 20 improvements between baseline and contender

	Benchmark	Params	Analysis		Results
	Benchmark	Params	z-score	Percent Change	Baseline result	Contender result	unit
arrow-compute-scalar-if-else-benchmark
C++	CoalesceScalarBench64	0	164.40	4,449%	5,058	7,309	MB/s¹
arrow-compute-vector-selection-benchmark
C++	FilterInt64FilterNoNulls	524288/2	494.30	32,020%	12,760	53,640	MB/s¹
C++	FilterStringFilterNoNulls	524288/1	102.40	9,197%	1,420	2,726	MB/s¹
C++	FilterStringFilterNoNulls	524288/2	108.40	22,340%	23,500	75,990	MB/s¹
arrow-io-memory-benchmark
C++	ParallelMemoryCopy	threads:1/real_time	175.70	7,386%	6,518	11,330	MB/s¹
C++	ParallelMemoryCopy	threads:2/real_time	278.50	7,214%	6,618	11,390	MB/s¹
C++	ParallelMemoryCopy	threads:4/real_time	436.70	7,062%	6,420	10,950	MB/s¹
C++	ParallelMemoryCopy	threads:6/real_time	257.80	6,371%	6,289	10,300	MB/s¹
parquet-encoding-benchmark
C++	BM_ByteStreamSplitDecode_Double_Scalar	32768	347.30	2,985%	2,374	3,083	MB/s¹
C++	BM_ByteStreamSplitDecode_Double_Scalar	65536	604.30	2,976%	2,371	3,076	MB/s¹
C++	BM_ByteStreamSplitDecode_Float_Scalar	1024	115.70	11,420%	1,510	3,236	MB/s¹
C++	BM_ByteStreamSplitDecode_Float_Scalar	4096	137.40	11,510%	1,513	3,255	MB/s¹
C++	BM_ByteStreamSplitEncode_Double_Scalar	1024	1,343.00	9,531%	2,097	4,096	MB/s¹
C++	BM_ByteStreamSplitEncode_Double_Scalar	32768	1,796.00	10,240%	1,992	4,033	MB/s¹
C++	BM_ByteStreamSplitEncode_Double_Scalar	4096	1,887.00	10,180%	1,997	4,032	MB/s¹
C++	BM_ByteStreamSplitEncode_Double_Scalar	65536	2,123.00	10,220%	1,992	4,029	MB/s¹
C++	BM_ByteStreamSplitEncode_Float_Scalar	1024	118.90	18,280%	1,473	4,167	MB/s¹
C++	BM_ByteStreamSplitEncode_Float_Scalar	4096	125.30	18,330%	1,481	4,196	MB/s¹
arrow.vector.VectorUnloaderBenchmark
Java	unloadBenchmark	source=java-micro, suite=arrow.vector.VectorUnloaderBenchmark	8,848.00	2,210,000%	4,326	960,100	i/s¹
arrow.vector.ipc.message.ArrowRecordBatchBenchmarks
Java	createAndGetLength	source=java-micro, suite=arrow.vector.ipc.message.ArrowRecordBatchBenchmarks	8,674.00	1,966,000%	39,410	7,786,000	i/s¹
¹ MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second

z-score distribution

Plotting the distribution of zscores for all microbenchmark results will help identify any systematic differences between the baseline and contender. The shape of the distribution of z-scores provides a sense of the overall performance of the contender relative to the baseline. Narrow distributions centered around 0 indicate that the contender is performing similarly to the baseline. Wider distributions indicate that the contender is performing differently than the baseline with left skewing indicating regressions and right skewing indicating improvements.

Plot.plot({
  y: {grid: true},
  x: {
    label: "z-score"
  },
  color: {legend: false},
  width: 1000,
  height: 400,
  marks: [
    Plot.rectY(microBmProced, Plot.binX({y: "count"}, {x: "analysis_lookback_z_score_z_score", fill: "grey", tip: true})),
    Plot.ruleY([0])
  ]
})

Plot = await import("https://esm.sh/@observablehq/plot");
import { aq, op } from '@uwdata/arquero';
boxWidth = 900
microBmProced = aq.from(transpose(ojs_micro_bm_proced))

Microbenchmark explorer

This microbenchmarks explorer allows you to filter the microbenchmark results by language, suite, and benchmark name and toggle regressions and improvements based on a threshold level of 5 z-scores. Languages, suite and benchmark name need to be selected to show a benchmark plot. Additional benchmark parameters are displayed on the vertical axis resulting in each bar representing a case permutation. If a benchmark does not have additional parameters, the full case permutation string is displayed. Each bar can be clicked to open the Conbench UI page for that benchmark providing additional history and metadata for that case permutation.

viewof changes = Inputs.checkbox(["Regressions", "Improvements"], {
  label: md`**Benchmark Status**`,
  value: ["Regressions"]
  })

// Choose the state of the benchmark
microBmProcedChanges = {
  let microBmProcedParams;
  let hasRegressions = changes.includes("Regressions");
  let hasImprovements = changes.includes("Improvements");
  microBmProcedParams = microBmProced
      .params({hr: hasRegressions, hi: hasImprovements})
  if (hasRegressions && hasImprovements) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_regression_indicated==$.hr || d.analysis_pairwise_improvement_indicated==$.hi);
  } else if (hasImprovements) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_improvement_indicated==$.hi)
  } else if (hasRegressions) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_regression_indicated==$.hr);
  } else {
    microBmProcedParams = microBmProced;
  }
  return microBmProcedParams;
}

// Choose the language
allLanguageValues = [null].concat(microBmProcedChanges.dedupe('language').array('language'))

viewof languageSelected = Inputs.select(allLanguageValues, {
    label: md`**Language**`,
    value: [allLanguageValues[0]],
    width: boxWidth
})

languages = {
  return (languageSelected === null)
  ? microBmProcedChanges // If languageSelected is "All languages", no filtering is applied
  : microBmProcedChanges.filter(aq.escape(d => op.includes(d.language, languageSelected)));
}


allSuiteValues = [null].concat(languages.dedupe('suite').array('suite'))

// Choose the suite
viewof suiteSelected = Inputs.select(allSuiteValues, {
    label: md`**Suite**`,
    value: [allSuiteValues[0]],
    width: boxWidth
})


suites = {
  return (suiteSelected === null)
  ? languages 
  : languages.filter(aq.escape(d => op.includes(d.suite, suiteSelected)));
}

allNameValues = [null].concat(suites.dedupe('name').array('name'))

// Choose the benchmark
viewof nameSelected = Inputs.select(allNameValues, {
    label: md`**Benchmark Name**`,
    value: [allNameValues[0]],
    width: boxWidth
})

microBmProcedChangesFiltered = {
  return (nameSelected === null)
  ? suites 
  : suites.filter(aq.escape(d => op.includes(d.name, nameSelected)));
}

margins = {
  let hasRegressions = changes.includes("Regressions");
  let hasImprovements = changes.includes("Improvements");
  let margin = [300, 300];
  if (hasRegressions && hasImprovements) {
    margin = margin;
  } else if (hasImprovements) {
    margin = [0, 600];
  } else if (hasRegressions) {
    margin = [600, 0];
  } 
  return margin;
}

displayPlot = nameSelected !== null && suiteSelected !== null && languageSelected !== null

// Only display plots if a benchmark is selected
mbPlot = {
  if (displayPlot) {
    return Plot.plot({
      width: 1200,
      height: microBmProcedChangesFiltered.numRows() * 30 + 100, //adjust height of plot based on number of rows
      marginRight: margins[0],
      marginLeft: margins[1],
      label: null,
      x: {
        axis: "top",
        label: "% change",
        labelAnchor: "center",
        labelOffset: 30
      },
      style: {
        fontSize: "14px",
        fontFamily: "Roboto Mono"
      },
      color: {
        range: ojs_change_cols,
        domain: ojs_pn_lab,
        type: "categorical",
        legend: true
      },
      marks: [
        Plot.barX(microBmProcedChangesFiltered, {
          y: "params",
          x: "change",
          color: "black",
          fill: "pn_lab",
          fillOpacity: 0.75,
          sort: { y: "x" },
          channels: { difference: "difference", params: "params" },
          href: "cb_url",
          tip: true
        }),
        Plot.gridX({ stroke: "white", strokeOpacity: 0.5 }),
        Plot.ruleX([0]),
        d3
          .groups(microBmProcedChangesFiltered, (d) => d.change > 0)
          .map(([posneg, dat]) => [
            Plot.axisY({
              x: 0,
              ticks: dat.map((d) => d.params),
              tickSize: 0,
              anchor: posneg ? "left" : "right"
            })
          ])
      ]
    });
  } else {
    return md`**Language, suite and benchmark all need a selection for a plot to be displayed.**`;
  }
}