Arrow Release Benchmark Report

Benchmark Run Summary

Run Type¹	Commit SHA	Time of Commit	Hardware	Languages	Benchmark Type	Number of Benchmarks
contender	6a2e19a	2024-07-11 08:57:21	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Python, R	macrobenchmarks	217
baseline	7dd1d34	2024-05-09 07:21:29	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Python, R	macrobenchmarks	217
baseline	7dd1d34	2024-05-09 07:21:29	AMD EPYC 7R13 Processor	C++, Java	microbenchmarks	3549
contender	6a2e19a	2024-07-11 08:57:21	AMD EPYC 7R13 Processor	C++, Java	microbenchmarks	3547
contender	6a2e19a	2024-07-11 08:57:21	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	JavaScript	microbenchmarks	92
baseline	7dd1d34	2024-05-09 07:21:29	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	JavaScript	microbenchmarks	92
¹ When we compare benchmark results, we always have a contender (the new code that we are considering) and a baseline (the old code that were are comparing to). The historic distribution will be drawn from all benchmark results on commits in the baseline commit's git ancestry, up to and including all runs on the baseline commit itself. In this context, a baseline is typically the last Arrow release and the contender is the current release candidate.

Macrobenchmarks

Live Conbench UI views for the macrobenchmarks are available at this url. Conbench is an additional method to explore the results of the benchmarks particularly if you want to see results from more of the history or see more metadata.

Benchmark Percent Changes

Benchmarks are plotted using the percent change from baseline to contender.
Additional information on each benchmark is available by hovering over the relevant bar.

Python

dataframe-to-table

dataset-filter

dataset-read

dataset-select

dataset-selectivity

file-read

file-write

recursive-get-file-info

wide-dataframe

R

dataframe-to-table

file-read

file-write

partitioned-dataset-filter

tpch

Microbenchmarks

There are currently 3641 microbenchmarks in the Arrow benchmarks. The following comparisons are also available to be viewed in the Conbench UI.

Language	Number of microbenchmarks
Language	Stable	Improvements	Regressions	No comparison	Total
C++	2373	552	619	2	3544
Java	NA	3	NA	NA	3
JavaScript	53	22	6	11	81

Because of the large number of benchmarks, the top 20 benchmark results that deviate most from the baseline in both the positive and negative directions are presented below. All microbenchmark results for this comparison can be explored interactively in the microbenchmark explorer.

Largest 20 regressions between baseline and contender

	Benchmark	Params	Analysis	Results
	Benchmark	Params	Percent Change	Baseline result	Contender result	unit
arrow-acero-expression-benchmark
C++	ExecuteScalarExpressionOverhead	complex_expression/rows_per_batch:1000/real_time/threads:16	−90.33%	444,700	846,400	ns¹
C++	ExecuteScalarExpressionOverhead	complex_integer_expression/rows_per_batch:1000/real_time/threads:16	−91.07%	442,700	845,800	ns¹
arrow-ipc-read-write-benchmark
C++	ReadMmapUncachedFileAsync	num_cols:64/is_partial:0/real_time	−83.06%	141,700	24,000	MB/s¹
C++	ReadMmapUncachedFileAsync	num_cols:64/is_partial:1/real_time	−66.32%	15,680	5,281	MB/s¹
C++	ReadMmapUncachedFile	num_cols:64/is_partial:0/real_time	−68.02%	134,800	43,130	MB/s¹
C++	ReadMmapUncachedFile	num_cols:64/is_partial:1/real_time	−84.55%	2,445	378	MB/s¹
C++	ReadMmapUncachedFile	num_cols:8/is_partial:1/real_time	−48.04%	943	490	MB/s¹
C++	ReadUncachedFileAsync	num_cols:64/is_partial:0/real_time	−83.54%	2,879	474	MB/s¹
C++	ReadUncachedFileAsync	num_cols:64/is_partial:1/real_time	−94.87%	8,076	414	MB/s¹
C++	ReadUncachedFile	num_cols:64/is_partial:0/real_time	−83.51%	2,863	472	MB/s¹
C++	ReadUncachedFile	num_cols:64/is_partial:1/real_time	−84.74%	2,549	389	MB/s¹
parquet-encoding-benchmark
C++	BM_DeltaDecodingByteArray	max-string-length:1024/batch-size:2048/prefixed-percent:90	−47.37%	22,560	11,870	MB/s¹
C++	BM_DeltaDecodingByteArray	max-string-length:1024/batch-size:2048/prefixed-percent:99	−49.02%	21,870	11,150	MB/s¹
C++	BM_DeltaDecodingByteArray	max-string-length:64/batch-size:2048/prefixed-percent:90	−47.95%	3,324	1,730	MB/s¹
C++	BM_DeltaDecodingByteArray	max-string-length:64/batch-size:2048/prefixed-percent:99	−47.85%	3,045	1,588	MB/s¹
C++	BM_DictDecodingByteArray	max-string-length:64/batch-size:2048	−47.86%	5,656	2,949	MB/s¹
C++	BM_PlainEncodingByteArray	max-string-length:64/batch-size:2048	−49.03%	6,332	3,227	MB/s¹
C++	BM_PlainEncodingByteArray	max-string-length:8/batch-size:2048	−54.82%	2,080	940	MB/s¹
C++	BM_PlainEncodingByteArray	max-string-length:8/batch-size:512	−48.93%	1,913	977	MB/s¹
C++	BM_PlainEncodingSpacedFloat	32768/1000	−48.62%	5,589	2,872	MB/s¹
¹ MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second

Largest 20 improvements between baseline and contender

	Benchmark	Params	Analysis	Results
	Benchmark	Params	Percent Change	Baseline result	Contender result	unit
arrow-compute-scalar-round-benchmark
C++	RoundDerivativesArrayBenchmark	<Trunc, FloatType>/size:524288/inverse_null_proportion:0	96.43%	6,818	13,390	MB/s¹
arrow-compute-vector-selection-benchmark
C++	FilterFSLInt64FilterNoNulls	524288/0	1,909.00%	1,232	24,740	MB/s¹
C++	FilterFSLInt64FilterNoNulls	524288/1	153.80%	1,074	2,725	MB/s¹
C++	FilterFSLInt64FilterNoNulls	524288/2	487.20%	12,620	74,130	MB/s¹
C++	FilterFSLInt64FilterWithNulls	524288/0	220.40%	963	3,087	MB/s¹
C++	TakeChunkedChunkedFSBRandomIndicesWithNulls	524288/1/9	320.70%	280,900,000	1,182,000,000	i/s¹
C++	TakeFSLInt64MonotonicIndices	524288/0	248.20%	292,700,000	1,019,000,000	i/s¹
C++	TakeFSLInt64RandomIndicesNoNulls	524288/0	142.90%	256,100,000	621,900,000	i/s¹
C++	TakeFSLInt64RandomIndicesWithNulls	524288/0	140.70%	257,700,000	620,300,000	i/s¹
C++	TakeFixedSizeBinaryRandomIndicesWithNulls	524288/1/9	1,446.00%	357,000,000	5,519,000,000	i/s¹
C++	TakeStringMonotonicIndices	524288/0	247.60%	292,500,000	1,017,000,000	i/s¹
arrow-ipc-read-write-benchmark
C++	ReadMmapUncachedFile	num_cols:1/is_partial:0/real_time	3,756.00%	3,657	141,000	MB/s¹
C++	ReadUncachedFile	num_cols:1/is_partial:0/real_time	5,763.00%	230	13,500	MB/s¹
C++	ReadUncachedFile	num_cols:8/is_partial:0/real_time	962.00%	250	2,659	MB/s¹
arrow-tensor-benchmark
C++	BatchToTensorSimple	<Int32Type>/size:32768/num_columns:3	142.10%	3,685	8,919	MB/s¹
C++	BatchToTensorSimple	<Int32Type>/size:524288/num_columns:3	229.50%	4,025	13,260	MB/s¹
arrow-thread-pool-benchmark
C++	ThreadedTaskGroup	threads:4/task_cost:1000/real_time	226.40%	248,200	810,400	i/s¹
parquet-column-reader-benchmark
C++	RecordReaderReadAndSkipRecords	Repetition:2/BatchSize:10000/LevelsPerPage:1000000	161.60%	231	604	MB/s¹
C++	RecordReaderReadRecords	Repetition:2/BatchSize:1000/ReadDense:1	334.20%	294	1,276	MB/s¹
C++	RecordReaderSkipRecords	Repetition:2/BatchSize:1000	347.50%	291	1,302	MB/s¹
¹ MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second

import { aq, op } from '@uwdata/arquero';
boxWidth = 900
microBmProced = aq.from(transpose(ojs_micro_bm_proced))

Microbenchmark explorer

This microbenchmarks explorer allows you to filter the microbenchmark results by language, suite, and benchmark name and toggle regressions and improvements based on a percent change between the baseline and contender |> . Languages, suite and benchmark name need to be selected to show a benchmark plot. Additional benchmark parameters are displayed on the vertical axis resulting in each bar representing a case permutation. If a benchmark does not have additional parameters, the full case permutation string is displayed. Each bar can be clicked to open the Conbench UI page for that benchmark providing additional history and metadata for that case permutation.

viewof changes = Inputs.checkbox(["Regressions", "Improvements"], {
  label: md`**Benchmark Status**`,
  value: ["Regressions"]
  })

// Choose the state of the benchmark
microBmProcedChanges = {
  let microBmProcedParams;
  let hasRegressions = changes.includes("Regressions");
  let hasImprovements = changes.includes("Improvements");
  microBmProcedParams = microBmProced
      .params({hr: hasRegressions, hi: hasImprovements})
  if (hasRegressions && hasImprovements) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_regression_indicated==$.hr || d.analysis_pairwise_improvement_indicated==$.hi);
  } else if (hasImprovements) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_improvement_indicated==$.hi)
  } else if (hasRegressions) {
    microBmProcedParams = microBmProced
      .filter((d, $) => d.analysis_pairwise_regression_indicated==$.hr);
  } else {
    microBmProcedParams = microBmProced;
  }
  return microBmProcedParams;
}

// Choose the language
allLanguageValues = [null].concat(microBmProcedChanges.dedupe('language').array('language'))

viewof languageSelected = Inputs.select(allLanguageValues, {
    label: md`**Language**`,
    value: [allLanguageValues[0]],
    width: boxWidth
})

languages = {
  return (languageSelected === null)
  ? microBmProcedChanges // If languageSelected is "All languages", no filtering is applied
  : microBmProcedChanges.filter(aq.escape(d => op.includes(d.language, languageSelected)));
}


allSuiteValues = [null].concat(languages.dedupe('suite').array('suite'))

// Choose the suite
viewof suiteSelected = Inputs.select(allSuiteValues, {
    label: md`**Suite**`,
    value: [allSuiteValues[0]],
    width: boxWidth
})


suites = {
  return (suiteSelected === null)
  ? languages 
  : languages.filter(aq.escape(d => op.includes(d.suite, suiteSelected)));
}

allNameValues = [null].concat(suites.dedupe('name').array('name'))

// Choose the benchmark
viewof nameSelected = Inputs.select(allNameValues, {
    label: md`**Benchmark Name**`,
    value: [allNameValues[0]],
    width: boxWidth
})

microBmProcedChangesFiltered = {
  return (nameSelected === null)
  ? suites 
  : suites.filter(aq.escape(d => op.includes(d.name, nameSelected)));
}

margins = {
  let hasRegressions = changes.includes("Regressions");
  let hasImprovements = changes.includes("Improvements");
  let margin = [300, 300];
  if (hasRegressions && hasImprovements) {
    margin = margin;
  } else if (hasImprovements) {
    margin = [0, 600];
  } else if (hasRegressions) {
    margin = [600, 0];
  } 
  return margin;
}

displayPlot = nameSelected !== null && suiteSelected !== null && languageSelected !== null

// Only display plots if a benchmark is selected
mbPlot = {
  if (displayPlot) {
    return Plot.plot({
      width: 1200,
      height: microBmProcedChangesFiltered.numRows() * 30 + 100, //adjust height of plot based on number of rows
      marginRight: margins[0],
      marginLeft: margins[1],
      label: null,
      x: {
        axis: "top",
        label: "% change",
        labelAnchor: "center",
        labelOffset: 30
      },
      style: {
        fontSize: "14px",
        fontFamily: "Roboto Mono"
      },
      color: {
        range: ojs_change_cols,
        domain: ojs_pn_lab,
        type: "categorical",
        legend: true
      },
      marks: [
        Plot.barX(microBmProcedChangesFiltered, {
          y: "params",
          x: "change",
          color: "black",
          fill: "pn_lab",
          fillOpacity: 0.75,
          sort: { y: "x" },
          channels: { difference: "difference", params: "params" },
          href: "cb_url",
          tip: true
        }),
        Plot.gridX({ stroke: "white", strokeOpacity: 0.5 }),
        Plot.ruleX([0]),
        d3
          .groups(microBmProcedChangesFiltered, (d) => d.change > 0)
          .map(([posneg, dat]) => [
            Plot.axisY({
              x: 0,
              ticks: dat.map((d) => d.params),
              tickSize: 0,
              anchor: posneg ? "left" : "right"
            })
          ])
      ]
    });
  } else {
    return md`**Language, suite and benchmark all need a selection for a plot to be displayed.**`;
  }
}