Advice to Data Analysts Preparing Analysis Reports in a Multidisciplinary Team Setting

This document outlines guidelines for summarizing reports of statistical and data analysis. The recommendations would serve well any data analyst working with an interdisciplinary team.

In a multidisciplinary research environment, statisticians and data analysts play a key role in organizing and documenting the research process. The report must provide enough detail for the lead investigator to retrace the presentation of the question, the analysis approach, modifications made, and the final results. The report of statistical analysis must go beyond print-outs of statistical logs. It should document the context of the analysis, describe the question, outline the analysis approach, and include suggestions for inferences and conclusions. Tables and figures should be human-readable and near-publication quality.

These guidelines are intended to complement good workflow practices for data analysis and align with principles of reproducible research.

Statistical analysis reports are evolving documents. Regular reviews with clinical investigators and collaborators help refine the analysis and reporting until lead authors are ready to generate a scientific manuscript.

Components of a Statistical Report

The following elements should usually be included. The following is also a suggested order. Most of this suggested report format is organizational and text-based. A minority of the report summarizes data analysis details (results of a data analysis) and technical output from statistical software might not even make it into the report.

Length

The ideal length of a report summarizing a single analysis is 2-4 pages, excluding appendix materials. Shorter reports are preferred. Longer reports probably present too much information to get good, focused feedback from audience members. It can be difficult to summarize complex analyses in 2-4 pages. However, if you find yourself in this situation, consider summarizing-the-summary in 2-4 pages, and treating the more lengthly report as an appendix.

Literate programming

Literate programming is a general term that means to combine human readable text with technical code and results. There are many different approaches and tools to facilitate literate programming. The data analytic reports described in this note are envisioned as being generated from a literate programming workflow.

The most commonly used environment for literate statistical programming is a combination of R and Markdown through RMarkdown and knitr, something the RStudio integrated development environment (IDE) specialized in for many years. The field is transitioning to Quarto, but there are many other options. Google and/or ChatGPT may help find the right solution for your preferred programming environment.


Rich Jones
last update 2024-10-23