Arrow Release Benchmark Report

Benchmark Run Summary

Run Type1 Commit SHA Time of Commit Hardware Languages Benchmark Type Number of Benchmarks

contender

6a2e19a

2024-07-11 08:57:21 Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Python, R

macrobenchmarks

217

baseline

7dd1d34

2024-05-09 07:21:29 Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Python, R

macrobenchmarks

217

baseline

7dd1d34

2024-05-09 07:21:29 AMD EPYC 7R13 Processor C++, Java

microbenchmarks

3549

contender

6a2e19a

2024-07-11 08:57:21 AMD EPYC 7R13 Processor C++, Java

microbenchmarks

3547

contender

6a2e19a

2024-07-11 08:57:21 Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz JavaScript

microbenchmarks

92

baseline

7dd1d34

2024-05-09 07:21:29 Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz JavaScript

microbenchmarks

92
1 When we compare benchmark results, we always have a contender (the new code that we are considering) and a baseline (the old code that were are comparing to). The historic distribution will be drawn from all benchmark results on commits in the baseline commit's git ancestry, up to and including all runs on the baseline commit itself. In this context, a baseline is typically the last Arrow release and the contender is the current release candidate.

Macrobenchmarks

Live Conbench UI views for the macrobenchmarks are available at this url. Conbench is an additional method to explore the results of the benchmarks particularly if you want to see results from more of the history or see more metadata.

Benchmark Percent Changes

  • Benchmarks are plotted using the percent change from baseline to contender.
  • Additional information on each benchmark is available by hovering over the relevant bar.

Python

dataframe-to-table

dataset-filter

dataset-read

dataset-select

dataset-selectivity

file-read

file-write

recursive-get-file-info

wide-dataframe

R

dataframe-to-table

file-read

file-write

partitioned-dataset-filter

tpch

Microbenchmarks

There are currently 3641 microbenchmarks in the Arrow benchmarks. The following comparisons are also available to be viewed in the Conbench UI.

Language Number of microbenchmarks
Stable Improvements Regressions No comparison Total
C++ 2373 552 619 2 3544
Java NA 3 NA NA 3
JavaScript 53 22 6 11 81

Because of the large number of benchmarks, the top 20 benchmark results that deviate most from the baseline in both the positive and negative directions are presented below. All microbenchmark results for this comparison can be explored interactively in the microbenchmark explorer.

Benchmark Params Analysis Results
Percent Change Baseline result Contender result unit
arrow-acero-expression-benchmark
C++

ExecuteScalarExpressionOverhead

complex_expression/rows_per_batch:1000/real_time/threads:16 −90.33% 444,700 846,400 ns1
C++

ExecuteScalarExpressionOverhead

complex_integer_expression/rows_per_batch:1000/real_time/threads:16 −91.07% 442,700 845,800 ns1
arrow-ipc-read-write-benchmark
C++

ReadMmapUncachedFileAsync

num_cols:64/is_partial:0/real_time −83.06% 141,700 24,000 MB/s1
C++

ReadMmapUncachedFileAsync

num_cols:64/is_partial:1/real_time −66.32% 15,680 5,281 MB/s1
C++

ReadMmapUncachedFile

num_cols:64/is_partial:0/real_time −68.02% 134,800 43,130 MB/s1
C++

ReadMmapUncachedFile

num_cols:64/is_partial:1/real_time −84.55% 2,445 378 MB/s1
C++

ReadMmapUncachedFile

num_cols:8/is_partial:1/real_time −48.04% 943 490 MB/s1
C++

ReadUncachedFileAsync

num_cols:64/is_partial:0/real_time −83.54% 2,879 474 MB/s1
C++

ReadUncachedFileAsync

num_cols:64/is_partial:1/real_time −94.87% 8,076 414 MB/s1
C++

ReadUncachedFile

num_cols:64/is_partial:0/real_time −83.51% 2,863 472 MB/s1
C++

ReadUncachedFile

num_cols:64/is_partial:1/real_time −84.74% 2,549 389 MB/s1
parquet-encoding-benchmark
C++

BM_DeltaDecodingByteArray

max-string-length:1024/batch-size:2048/prefixed-percent:90 −47.37% 22,560 11,870 MB/s1
C++

BM_DeltaDecodingByteArray

max-string-length:1024/batch-size:2048/prefixed-percent:99 −49.02% 21,870 11,150 MB/s1
C++

BM_DeltaDecodingByteArray

max-string-length:64/batch-size:2048/prefixed-percent:90 −47.95% 3,324 1,730 MB/s1
C++

BM_DeltaDecodingByteArray

max-string-length:64/batch-size:2048/prefixed-percent:99 −47.85% 3,045 1,588 MB/s1
C++

BM_DictDecodingByteArray

max-string-length:64/batch-size:2048 −47.86% 5,656 2,949 MB/s1
C++

BM_PlainEncodingByteArray

max-string-length:64/batch-size:2048 −49.03% 6,332 3,227 MB/s1
C++

BM_PlainEncodingByteArray

max-string-length:8/batch-size:2048 −54.82% 2,080 940 MB/s1
C++

BM_PlainEncodingByteArray

max-string-length:8/batch-size:512 −48.93% 1,913 977 MB/s1
C++

BM_PlainEncodingSpacedFloat

32768/1000 −48.62% 5,589 2,872 MB/s1
1 MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second
Benchmark Params Analysis Results
Percent Change Baseline result Contender result unit
arrow-compute-scalar-round-benchmark
C++

RoundDerivativesArrayBenchmark

<Trunc, FloatType>/size:524288/inverse_null_proportion:0 96.43% 6,818 13,390 MB/s1
arrow-compute-vector-selection-benchmark
C++

FilterFSLInt64FilterNoNulls

524288/0 1,909.00% 1,232 24,740 MB/s1
C++

FilterFSLInt64FilterNoNulls

524288/1 153.80% 1,074 2,725 MB/s1
C++

FilterFSLInt64FilterNoNulls

524288/2 487.20% 12,620 74,130 MB/s1
C++

FilterFSLInt64FilterWithNulls

524288/0 220.40% 963 3,087 MB/s1
C++

TakeChunkedChunkedFSBRandomIndicesWithNulls

524288/1/9 320.70% 280,900,000 1,182,000,000 i/s1
C++

TakeFSLInt64MonotonicIndices

524288/0 248.20% 292,700,000 1,019,000,000 i/s1
C++

TakeFSLInt64RandomIndicesNoNulls

524288/0 142.90% 256,100,000 621,900,000 i/s1
C++

TakeFSLInt64RandomIndicesWithNulls

524288/0 140.70% 257,700,000 620,300,000 i/s1
C++

TakeFixedSizeBinaryRandomIndicesWithNulls

524288/1/9 1,446.00% 357,000,000 5,519,000,000 i/s1
C++

TakeStringMonotonicIndices

524288/0 247.60% 292,500,000 1,017,000,000 i/s1
arrow-ipc-read-write-benchmark
C++

ReadMmapUncachedFile

num_cols:1/is_partial:0/real_time 3,756.00% 3,657 141,000 MB/s1
C++

ReadUncachedFile

num_cols:1/is_partial:0/real_time 5,763.00% 230 13,500 MB/s1
C++

ReadUncachedFile

num_cols:8/is_partial:0/real_time 962.00% 250 2,659 MB/s1
arrow-tensor-benchmark
C++

BatchToTensorSimple

<Int32Type>/size:32768/num_columns:3 142.10% 3,685 8,919 MB/s1
C++

BatchToTensorSimple

<Int32Type>/size:524288/num_columns:3 229.50% 4,025 13,260 MB/s1
arrow-thread-pool-benchmark
C++

ThreadedTaskGroup

threads:4/task_cost:1000/real_time 226.40% 248,200 810,400 i/s1
parquet-column-reader-benchmark
C++

RecordReaderReadAndSkipRecords

Repetition:2/BatchSize:10000/LevelsPerPage:1000000 161.60% 231 604 MB/s1
C++

RecordReaderReadRecords

Repetition:2/BatchSize:1000/ReadDense:1 334.20% 294 1,276 MB/s1
C++

RecordReaderSkipRecords

Repetition:2/BatchSize:1000 347.50% 291 1,302 MB/s1
1 MB/s = megabytes per second; ns = nanoseconds; i/s = iterations per second

Microbenchmark explorer

This microbenchmarks explorer allows you to filter the microbenchmark results by language, suite, and benchmark name and toggle regressions and improvements based on a percent change between the baseline and contender |> . Languages, suite and benchmark name need to be selected to show a benchmark plot. Additional benchmark parameters are displayed on the vertical axis resulting in each bar representing a case permutation. If a benchmark does not have additional parameters, the full case permutation string is displayed. Each bar can be clicked to open the Conbench UI page for that benchmark providing additional history and metadata for that case permutation.