Biodiversity studies are sensitive to well-recognized temporal and spatial scale dependencies. Cross-study syntheses may inflate these influences by collating studies that vary widely in the numbers and sizes of sampling plots. Here we evaluate sources of inaccuracy and imprecision in study-level and cross-study estimates of biodiversity differences, caused by within-study grain and sample sizes, biodiversity measure, and choice of effect-size metric. Samples from simulated communities of old-growth and secondary forests demonstrated the influences of all these parameters on the accuracy and precision of cross-study effect sizes. In cross-study synthesis by formal meta-analysis, the metric of log response ratio applied to measures of species richness yielded better accuracy than the commonly used Hedges’ g metric on species density, which dangerously combined higher precision with persistent bias. Full-data analyses of the raw pilot-scale data using multilevel models were also susceptible to scale-dependent bias. We demonstrate the challenge of detecting scale dependence in cross-study synthesis, due to ubiquitous covariation between replication, variance, and plot size. We propose solutions for diagnosing and minimizing bias. We urge that empirical studies publish raw data to allow evaluation of covariation in cross-study syntheses, and we recommend against using Hedges’ g in biodiversity meta-analyses.