John E. Edlund, PhD
Rochester Institute of Technology (NY)
When I teach research methods, one of the things I always explain is that there are multiple paths to good research ideas. Naturally, I tell my students that excellent studies can come from new theories, applying an existing theory to a novel domain, or even something as simple as looking at a published study’s limitations and future directions. I also tell my students that some of the most influential studies have come from personal observations that make you go “hmm.” Indeed, a similar sentiment was uttered by Issac Asimov, “The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny . . . ”
My interest in responsible science came from two sets of “Hmm” moments. The first occurred when I was working on my master’s thesis. In this research, I had a confederate discover evidence of crosstalk (where past participants talk to future participants revealing nondisclosed study information) in participants. That led me to ask the related questions of: (a) “How commonly does crosstalk occur?”, and (b) “What can we do to cut down on its prevalence?” These research questions led to my first publication where I explored crosstalk (Edlund, Sagarin, Skowronski, Johnson, & Kutter, 2009).
The second “Hmm” moment occurred during the first semester that I was employed in my current position. I took my existing studies and submitted them for IRB approval at my current institution (simply changing the IRB contact information and my contact information), and I promptly had every study bounced from the IRB due to the construction of the consent form itself. The IRB chair kindly pointed me to federal guidance that suggested that first-person prose should be avoided as first-person prose is commonly associated with legal documents (and hence, potentially seen by participants as more coercive). This surprising rejection led me to directly explore issues related to consent in psychology (Edlund, Hartnett, Heider, Perez, & Lusk, 2014) and in medicine (Edlund, Edlund, & Carey, 2015).
Of course, nothing in psychology happens in a vacuum. As I was discovering my interests in researching how we do research, the field of psychology was forced into a very careful look at its own research. First, multiple instances of researchers faking their data (e.g., Tillburg University, 2011)—the cardinal sin of science—were uncovered. Then, it was discovered that researchers were engaging in questionable research practices (like p-hacking and selectively reporting results; John, Loewenstein, & Prelec, 2012), and that the replication rate of studies published in major journals was surprisingly low (Open Science Collaboration, 2015).
Questions about the integrity and reproducibility of psychological science have led myself, as well as others, to explore and document the best ways to engage in responsible science. For the field of psychology, in my eyes, there are three major areas you need to pay attention to in order to be a responsible scientist. Importantly, these are meant to cut across all of psychology and all of the methods employed across the various subdisciplines (indeed, these cut across all of the behavioral and social sciences: Edlund & Nichols, 2019). As such, these ideas and suggestions aren’t likely to help you design your study itself, but will help you in designing a responsible study. The three major areas are: treating participants well, engaging in responsible statistics and communication, and embracing solutions to the existing problems in the field.
Treating Participants Well
It is no coincidence that treating participants well is the first section that I cover. More so than anything else we do as scientists, we need to think about our participants first. From a practical and ethical standpoint, we need to think about everything that will impact our participants from the moment they are exposed to our study (as even the advertising of the study can impact who signs up for the research: Jackson, Procidano, & Cohen, 1989; Sutton & Edlund, 2019). Research has shown that many participants will show up looking to potentially confirm the researcher’s hypotheses if they are able to discern them (Nichols & Maner, 2008).
Typically, the first formal interaction we will have with our participants is the consent process. Many people think of consent as simply handing a form over to a participant to sign (or an electronic acknowledgement of reading the consent form). Numerous sources discuss the minutia of what should be in a consent form but it ultimately boils down to the simple fact that you need to give the participants enough information to decide for themselves if they want to participate. Sometimes, you won’t be able to tell participants everything (for instance, if telling participants the goals of the study would change their behavior, Milgram, 1963). Certainly, I personally believe that deception can be warranted in some cases. In these cases, your IRB will have a protocol for modifying aspects of the consent process to protect the participants. There are also other cases where the consent process will be modified to protect your participants (for instance, if your participants are under 18 you will need both a parental consent and an assent process).
After your participants are done with the study, you will need to ascertain how the study went. This can be as simple as a single text-based question in an online study (asking if they had any further questions or want more information about the study). This can also be a more involved suspicion probe and debriefing process (which is especially important if the study involved deception or was potentially aversive to participants). My recommendation is that every study have a brief suspicion probe (see Blackhart, Brown, Clark, Pierce, & Shell, 2012, for options on suspicion probes) and an educational debriefing (at a minimum telling participants why they went through the research). This debriefing should also include a brief request that participants not discuss the study with other potential participants (crosstalk: Edlund et al., 2009).
Another aspect of treating participants well is ensuring that you are not wasting participants’ time. You should calibrate your study to be addressing your scientific question in the most reasonable manner. You shouldn’t ask a bunch of questions just for asking questions and seeing what happens (indeed, this opens issues with responsible statistics). This does not mean that I am not a proponent of exploratory research—in fact, some of my studies I am most proud of started as purely exploratory research. However, you need to ensure that a relevant scientific question is being posed and that you don’t waste your participants’ time (indeed, this is another reason why you should be concerned about crosstalk and suspicion as this leads to wasted time). One simple solution can be to pilot test your study.
Responsible Statistics and Communication
Another aspect of being a responsible scientist is being responsible in your use of statistics and how you communicate your work to the world. To me, one of the scariest things that we have realized as a field is the incredibly dangerous power of p-hacking (Simonsohn, Nelson, & Simmons, 2014). P-hacking encompasses a number of different things a researcher can do to achieve a desired p-value (often p < .05), including selective decisions about when to delete participant cases, collect extra data, and stop data collection, etc. These behaviors greatly increase the false positive rate of research studies, reducing the extent to which we can trust the findings. One of the things that can make this particularly dangerous is that it is relatively easy for a well-intentioned scientist to engage in p-hacking without having nefarious intentions (driven in no small part by our normal human cognitive processes).
Another common issue encountered in the field is the use of multiple analyses without accounting for type 1 error inflation. When I talk about this issue, I ask my students if they would believe a single t test that found a difference between two groups that was a p < .05 (and this was the only test done). Without fail, the students all are willing to believe it. I next ask them whether they would believe five reported t tests that were a p < .05. If anything, they are more confident in saying yes. Finally, I ask them if they would still be confident in the results if they learned that there were really 100 tests done and only five were significant (but the researchers didn’t disclose the other tests). Many students become very uneasy as they implicitly recognize the issues with error inflation.
This leads to being responsible in communication. I have always been in favor of providing as many details as possible in a study. As a scientist, I have always included details on everything I did: the effects that were significant, nonsignificant, and everything in between. More than once, I have been asked to cut some of the details by an editor or reviewers; however, in my first pass, I always aim for full transparency. Certainly, there are reasons to drop participants from analyses (that no one will question). Generally speaking, you are safe in any decisions you make before you start analyzing your data—dropping participants after you start data analysis is much more problematic. I have dropped participants who have come to the lab clearly in an altered state of consciousness, participants who have answered “7” on every response, and participants who skipped the vast majority of items. Importantly, I make these decisions before I start any of the analyses. Regardless of how you approach this, you should provide details on everything you did (and why!).
So what do we do about these problems? Some of the solutions I have already detailed. However, there are more solutions that will lead to better and more responsible science. One of the solutions is to move away from an exclusive focus on p < .05. The p < .05 heuristic is still useful, but there are better options. One option is to focus more on effect sizes. Effect sizes tell you how big the difference between the groups are. In many ways, this is the most important thing anyway.
Statistics teachers have long noted that getting people to understand effect sizes is challenging. In my experience, the best way to get students to think about effect sizes it to get them to think about bacon and cancer. Not that long ago, a study made international news when the study linked the consumption of red and processed meats to colorectal cancer (Bouvard et al., 2015). Given the average student’s love of bacon, this study presents a major challenge to their worldview. Then, they quickly realize that while the relationship is certainly statistically significant, the amount of bacon one would need to consume to likely be afflicted with cancer is much higher than the typical person eats.
Another solution is the embracing of preregistrations. All PhDs and many MS trained researchers have already needed to do this at least once (as everyone “preregisters” their thesis and dissertation as part of the proposal process). The information needed to preregister a study is relatively minimal and is largely not wasted effort (as most of the work in preregistering can then be translated to an eventual publication). Ultimately, preregistering your study reduces subjective analytic decisions that could inflate the false positive rate. The OSF (osf.io) is one resource that is available to facilitating preregistrations.
Perhaps the most important change that is happening in the field is adoption of open science. Open science generally speaking means increasing transparency in communication about all steps of the research process. Currently, there are several badges that many journals award for adopting open science. You can get badges for preregistration (yet another benefit of preregistration), open materials (posting your materials in a publically accessible repository), and open data (posting your anonymized data in a publically accessible repository). Early research (Kidwell et al., 2016) has suggested that journals awarding badges is actually leading to science being more open. Ultimately, sharing makes for better science because weaknesses and strengths can be more easily discovered and replicated.
Finally, it is exciting to see the field being more open to replications. Even though the social and behavioral sciences have long known about the file-drawer problem, little appetite existed for facilitating replications even though replication is the cornerstone of science. Recently however, many journals (such as the Psi Chi Journal) have explicitly issued a call for high-quality replications and many more journals are open to it. Even further, some journals have moved to reregistered submissions where you submit an introduction and proposed methods section to the journal, which will go through peer review before any of the data is collected.
Certainly, when I started in graduate school, I would never have imagined the directions my research would take me, nor would I have imagined the challenges the field of psychology would soon be encountering. Some in the media have taken to calling psychology (and science more generally) as being broken and the crisis of confidence is suggesting that we shouldn’t believe anything that psychology has to say. I don’t think that could be farther from the truth! Certainly, some of our methods have been irresponsible and have led to incorrect conclusions. However, as a field, we are looking at ourselves and becoming better. To me, this means we are living in an exciting time where we can truly make ourselves and the field better.
Blackhart, G. C., Brown, K. E., Clark, T., Pierce, D. L., & Shell, K. (2012). Assessing the adequacy of postexperimental inquiries in deception research and the factors that promote participant honesty. Behavior Research Methods, 44, 24–40. https://doi.org/10.3758/s13428-011-0132-6
Bouvard, V., Loomis, D., Guyton, K. Z., Grosse, Y., Ghissassi, F. E., Benbrahim-Tallaa, L., . . . Straif, K. (2015). Carcinogenicity of consumption of red and processed meat. The Lancet, 16, 1599–1600. https://doi.org/10.1016/S1470-2045(15)00444-1
Edlund, J. E., Edlund, A. E., & Carey, M. G. (2015). Patient understanding of risk and benefit with informed consent in a left ventricular assist device population: A pilot study. Journal of Cardiovascular Nursing, 30, 435–439. https://doi.org/10.1097/JCN.0000000000000188
Edlund, J. E., Hartnett, J. H., Heider, J. H., Perez, E. J. G, & Lusk, J. G (2014). Experimenter characteristics and word choice: Best practices when administering an informed consent. Ethics and Behavior, 24, 397–407. https://doi.org/10.1080/10508422.2013.854171
Edlund, J. E., & Nichols, A. L. (Eds.) (2019). Advanced research methods for the social and behavioral sciences. Cambridge, UK: Cambridge University Press.
Edlund, J. E., Sagarin, B. J, Skowronski, J. J., Johnson, S. G, & Kutter, J. G (2009). Whatever happens in the laboratory stays in the laboratory: The prevalence and prevention of participant crosstalk. Personality and Social Psychology Bulletin, 35, 635–642. https://doi.org/10.1177/0146167208331255
Jackson, J. M., Procidano, M. E., & Cohen, C. J. (1989). Subject pool sign-up procedures: A threat to external validity. Social Behavior and Personality, 17, 29–43. https://doi.org/10.2224/sbp.19184.108.40.206
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953
Kidwell, M. C., Lazarević, L. B., Baranski, E., Hardwicke, T. E., Piecowski, S., Falkenberg, L. S., . . . Nosek, B. A. (2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14, e1002456. https://doi.org/10.1371/journal.pbio.1002456
Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67, 371–378. https://doi.org/10.1037/h0040525
Nichols, A. L., & Maner, J. K. (2008). The good-subject effect: Investigating participant demand characteristics. The Journal of General Psychology, 135, 151–165. https://doi.org/10.3200/GENP.135.2.151-166
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9, 666–681. https://doi.org/10.1177/1745691614553988
Sutton, T., & Edlund, J. E. (2019). Assessing self-selection bias as a function of experiment title and description: The effect of emotion and personality. North American Journal of Psychology, 21, 407–422.
Tilburg University. (2011). Interim report regarding the breach of scientific integrity committed by prof. D.A. Stapel. Retrieved from https://studylib.net/doc/10483458/interim-report-regarding-the-breach-of-scientific--tilbur