Advice to Data Analysts Preparing Analysis Reports in a Multidisciplinary Team Setting
This document outlines guidelines for summarizing reports of statistical and data analysis. The recommendations would serve well any data analyst working with an interdisciplinary team.
In a multidisciplinary research environment, statisticians and data analysts play a key role in organizing and documenting the research process. The report must provide enough detail for the lead investigator to retrace the presentation of the question, the analysis approach, modifications made, and the final results. The report of statistical analysis must go beyond print-outs of statistical logs. It should document the context of the analysis, describe the question, outline the analysis approach, and include suggestions for inferences and conclusions. Tables and figures should be human-readable and near-publication quality.
These guidelines are intended to complement good workflow practices for data analysis and align with principles of reproducible research.
Statistical analysis reports are evolving documents. Regular reviews with clinical investigators and collaborators help refine the analysis and reporting until lead authors are ready to generate a scientific manuscript.
Components of a Statistical Report
The following elements should usually be included. The following is also a suggested order. Most of this suggested report format is organizational and text-based. A minority of the report summarizes data analysis details (results of a data analysis) and technical output from statistical software might not even make it into the report.
- Organizing information
- Author: Enter your name and/or the names of your co-analysts.
- Title: Include a short pithy and descriptive title for the project or analysis.
- Date: All reports must be dated.
- Page numbers: If the report will be printed, page numbers are required.
- Project goals: A written description of the overarching project goals. This text should summarize the long-range goals of the project (and not necessarily the immediate goals of the current analysis). Including project goals is important for non-technical audiences as it helps provide context and align the analysis within the broader aims of the research, making it easier for them to understand the significance of the work. Sometimes the audience (e.g., lab leaders) are involved with many different projects. They need to be reminded of a particular project’s goals to get them thinking strategically about the current analysis.
- Analysis goals: A written description of the current analysis. Reiterating analysis goals helps audiences grasp the specific focus of the current analysis, which can otherwise get lost in technical details.
- Current issue: What specific feedback is needed from the audience? For example, are there particular data quality issues that need to be resolved? Does the group need to approve the final tables and figures, or provide suggestions on improving clarity and interpretation of the results?
- Results summary: A high-level written description of the results of the current analysis. This is the place to write out inferences, questions raised by current results, or other analytic decisions that are brought into question on the basis of the current results. Highlight the results that you want your team members to focus on during the presentation of the results. Presenting results and questions before methods can help non-technical audiences quickly understand the key outcomes without getting bogged down in technical details. A bulleted list is a good strategy here.
- Results details: A detailed description of the methods, key analytic decisions, and main results of the current analysis. Include only essential details to avoid overwhelming the audience. Mention all tables and figures in this section, ensuring they are numbered and near-publication quality.
- Methods: A concise description of the analytic decisions used in the current analysis. This should be a good first draft of what could be included in a research publication.
- Appendix material
- Results appendix materials: This is optional, and only necessary if most of your audience are familiar with technical output from statistical data analysis programs and would be interesting in seeing such results. In this section of your report to include a log or listing from a statistical software package that generated the results summarized in results summary and/or results details sections.
- Decision log: It is usually also appropriate—and a great way to protect against repeating analysis and discussion steps—to keep a running log of important analytic decisions documented in this section, as analyses are updated week to week (e.g., which records are to be included in the analysis, how missing data are handled).
- Posting info: This is information for your future self. Record information such as the syntax file that produced the results so that you can re-create the results in the future. It is a good idea to have a single (set of) command file(s) that generate a specific report for your audience members, and after a report is prepared and shared, never change that (set of) command file(s). If the analysis will change, copy the command file(s) to a new working folder and preserve the old copies. If appropriate, this might also be a good place to share information about data sets used or saved for the current analysis.
Length
The ideal length of a report summarizing a single analysis is 2-4 pages, excluding appendix materials. Shorter reports are preferred. Longer reports probably present too much information to get good, focused feedback from audience members. It can be difficult to summarize complex analyses in 2-4 pages. However, if you find yourself in this situation, consider summarizing-the-summary in 2-4 pages, and treating the more lengthly report as an appendix.
Literate programming
Literate programming is a general term that means to combine human readable text with technical code and results. There are many different approaches and tools to facilitate literate programming. The data analytic reports described in this note are envisioned as being generated from a literate programming workflow.
The most commonly used environment for literate statistical programming is a combination of R and Markdown through RMarkdown and knitr, something the RStudio integrated development environment (IDE) specialized in for many years. The field is transitioning to Quarto, but there are many other options. Google and/or ChatGPT may help find the right solution for your preferred programming environment.
Rich Jones
last update 2024-10-23