Best practices
See Daisuke Mori's translation of this page into Japanese
These Best Practices come from the Textbook Configurational Comparative Methods (2008) edited by Rihoux and Ragin. Complementary informations of these Best Practices can be found in the Textbook (see pages below). If you have any suggestion or comment concerning this page, please send a message to Hisako Nomura
- 1. Case Selection in Small- and Intermediate-N Research Designs
- 2. Condition Selection in Small- and Intermediate-N Research Designs
- 3. How to Dichotomize Conditions in a Meaningful Way
- 4. Things to Check to Assess the Quality of a Truth Table
- 5. How to Resolve Contradictory Configurations
- 6. Four Complete Minimization Procedures to Be Run and Made Explicit
- 7. Threshold-Setting With mvQCA
- 8. Specific to mvQCA
- 9. Specific to the Calibration of Fuzzy Sets
- 10. Specific to fsQCA
- 11. Technical Arbitrations and Practical Steps Throughout the QCA Procedures
- 12. Transparency
- 13. Words Matter, So Use the Correct Terminology!
1. Case Selection in Small- and Intermediate-N Research Designs (Textbook, pp.20-25)
- Make sure that all cases share enough background characteristics
- Make sure that you have a very clear definition of the outcome you are trying to “explain” across the cases
- Generally, it is best to include both cases with a “positive” outcome and cases with a “negative” outcome
- Don’t take your population (or sample) of cases as a “given”; leave open the possibility to include additional cases or to remove cases at a later stage of the research
- If you engage in a small- or intermediate-N design: when pondering on how many cases you can manage, ask yourself whether you can gain sufficient familiarity (empirical “intimacy”) with each case
- If you engage in a large-N design, make sure you gain sufficient familiarity with the types (kinds or categories) of cases
2. Condition Selection in Small- and Intermediate-N Research Designs (Textbook, pp.25-28)
- Do not include a condition that does not vary across the cases. In other words, “a variable must vary,” otherwise it is a constant.
- Keep the number of conditions relatively low. A large number of conditions tends to “individualize” each case, making it difficult to find any regularity or any synthetic explanation of the outcome across the cases
- Altogether, a good balance must be reached between the number of cases and the number of conditions. The ideal balance is not a purely numerical one and will most of the time be found by trial and error. A common practice, in an intermediate-N analysis (say, 10 to 40 cases) would be to select from 4 to 6–7 conditions
For each condition, formulate a clear hypothesis regarding its connection to the outcome; if possible, formulate this hypothesis in the form of a statement about necessity and/or sufficiency
3. How to Dichotomize Conditions in a Meaningful Way (Textbook, pp.39-44)
- Always be transparent when justifying thresholds
- It is best to justify the threshold on substantive and/or theoretical grounds
- If this is not possible, use technical criteria (e.g., considering the distribution of cases along a continuum). As a last resort, some more mechanical cutoff points such as the mean or median can be used, but one should check whether this makes sense considering the distribution of 5 the cases
- Avoid artificial cuts dividing cases with very similar values
- More elaborate technical ways can also be used, such as clustering techniques, but then you should evaluate to what extent the clusters make theoretical or empirical sense
- No matter which technique or reasoning you use to dichotomize theconditions, make sure to code the conditions in the correct “direction,” so that their presence ([1] value) is theoretically expected to be associated with a positive outcome ([1] outcome value)
4. Things to Check to Assess the Quality of a Truth Table (Textbook, pp.44-47)
- Check again that there is a mix of cases with a “positive” outcome and cases with a “negative” outcome
- Check that there are no counterintuitive configurations. In this example, these would be configurations in which all [0] condition values lead to a [1] outcome, or all [1] condition values lead to a [0] outcome
- Check for cross-condition diversity; in particular, make sure that some conditions do not display exactly the same values across all cases; if they do, ask yourself whether those conditions are too “proximate” to one another (if they are, they can be merged)
- Check that there is enough variation for each condition (a general rule: at least 1/3 of each value) (see also Box 2.3, good practices for condition selection: “a variable must vary”...)
If one of these criteria is not met, reconsider your selection of cases and/or conditions or possibly the way you have defined and operationalized the outcome. It is also useful, at this stage, to check for the necessity and sufficiency of each condition with regard to the outcome.
5. How to Resolve Contradictory Configurations (Textbook, pp.48-56)
There are basically eight strategies. In real-life research, it is advisable to at least consider all those strategies, and most often it will turn out that some combination is useful:
- Probably the easiest one: Simply add some condition(s) to the model. Indeed, the more complex the model—the more numerous the conditions—the less likely contradictions will occur, because each condition added constitutes a potential additional source of differentiation between the cases. Of course, such a strategy should not be pursued in a “hope-and-poke” way; it should be cautious and theoretically justified. It is advisable to add conditions one by one, not to obtain too complex a model. Otherwise, you run the risk of creating a greater problem of “limited diversity” (see p. 25) and thus “individualizing” explanations of each particular case; this means that csQCA will have missed its purpose of reaching some degree of parsimony
- Remove one or more condition(s) from the model and replace it/them by (an)other condition(s)
- Reexamine the way in which the various conditions included in the model are operationalized. For instance, it may be that the threshold of dichotomization for a given condition is the source of the contradiction between two cases. By adjusting the threshold, it may be possible to resolve the contradiction. Alternatively, the contradiction could be due to data quality problem—in that case, one could collect complementary or revised data. This is the most labor-intensive option but very much to be advocated from a case-oriented perspective
- Reconsider the outcome variable itself. This strategy is often overlooked. If the outcome has been defined too broadly, it is quite logical that contradictions may occur. For instance, Rihoux (2001) noticed, during some exploratory csQCA analyses, that his initial outcome variable—major organizational change in a given political party—could in fact be decomposed into two opposed subtypes: organizational adaptation and organizational radicalization. By focusing the outcome solely on organizational adaptation, he was able to resolve many contradictory configurations
- Reexamine, in a more qualitative and “thick” way, the cases involved in each specific contradictory configuration. What has been missed? What could differentiate those cases, that haven’t been considered, either in the model or in the way the conditions or the outcome have been operationalized?
- Reconsider whether all cases are indeed part of the same population (cf. case selection, p. 20). For instance, if it is a “borderline” case that is creating the contradiction, perhaps this case should be excluded from the analysis
- Recode all contradictory configurations as [0] on the outcome value. This solution, suggested by Ragin (1987), treats contradictory configurations as “unclear” and thus decides to accept fewer minimizable configurations in exchange for more consistency in the cases/outcome relationship
- Use frequency criteria to “orientate” the outcome. Let us consider a contradictory configuration that involves nine cases. If, say, it leads to a [1] outcome for eight cases and to a [0] outcome for only one case, one could consider that the “most frequently travelled path” wins—thus the outcome would be considered as having a [1] value for all nine cases.
Note, however, that this more probabilistic strategy is disputable from a “case-oriented” perspective Of course, the strategy(ies) chosen must be justified on empirical grounds (case-based knowledge) and/or on theoretical grounds and not be the result of some opportunistic “manipulation”
6. Four Complete Minimization Procedures to Be Run and Made Explicit (Textbook, pp.59-65)
Checklist for the minimization procedure(s), using the computer software:
- Perform the minimization both with and without inclusion of logical remainders. Each of these approaches may yield information of some interest.
- Thus: Four complete minimization procedures must be run:
- [1] configurations, without logical remainders
- [1] configurations, with logical remainders
- [0] configurations, without logical remainders
- [0] configurations, with logical remainders
- Ask the software to list the “simplifying assumptions” and display those in your research report
- Check for possible “contradictory simplifying assumptions” and, insofar as possible, solve them
- Present all your minimal formulas (including the case labels), and if needed use visual displays (e.g. Venn diagrams) to make the minimal formulas more understandable for the reader
- If it is useful for your interpretation, factor (by hand) some conditions, to make the key regularities in the minimal formulas more apparent
- Assess the “coverage” of the minimal formulas—i.e., the connection between the respective terms of the minimal formulas and the observed cases
7. Threshold-Setting With mvQCA (Textbook, pp.76-78)
NB: The “good practices” specified for dichotomization in csQCA are also valid for mvQCA
- In most cases, only three or four values per condition should be used with mvQCA
- It is best to limit the number of multi-valued conditions, maintaining a preponderance of dichotomous conditions
8. Specific to mvQCA (Textbook, pp.69-86)
All “good practices” for csQCA (see Chapter 3) are also valid for mvQCA. In addition, here are some more specific ones for mvQCA:
- If justifiable from an empirical and/or theoretical perspective, use a preponderance of dichotomous conditions
- Multi-value conditions can be used to create a more genuine representation of multi categorical nominal data, ordinal data, and interval data
- Include multi-value conditions “à la carte,” when needed. If at all possible, keep the number of values low (e.g., if you have the choice between three or four categories, use three); more than five categories should be avoided
- Use the “thresholdssetter” function (TOSMANA) more systematically, to check visually the meaningfulness of the thresholds
9. Specific to the Calibration of Fuzzy Sets (Textbook, pp.89-94)
Similarly with dichotomization in csQCA and thresholds setting in mvQCA (see Box 4.4), the calibration of fuzzy sets is a key operation, to be performed with great care. Some good practices—e.g., being transparent or justifying the cutoff points on substantive and/or theoretical grounds—are common to all three operations. Here are some specific good practices for the calibration of fuzzy sets:
- Carefully identify and define the target category using set theoretic language (e.g., the set of “less developed countries” or the set of “more urbanized countries”)
- Based on theoretical and substantive knowledge, specify what it takes to warrant “full membership” in this set (a fuzzy score of 1.0) and full exclusion from this set (a fuzzy score of 0)
- Make sure that extraneous or irrelevant variation is truncated (e.g., variation in an index variable like GNP/capital among the countries that are unquestionably fully in or fully out of the target set; for example, the set of less developed countries)
- Evaluate what constitutes maximum ambiguity in whether a case is more in or out of the target set (e.g., the GNP/capita score that is at the border between countries that are more in versus more out of the set of “less developed countries”). This evaluation provides the basis for establishing the crossover point (0.50)
- If you are basing your fuzzy membership scores on an index variable that is interval or ratio scale, use FSQCA’s “calibrate” procedure to create the fuzzy set. To do this, you will need to be able to specify threshold values for full membership, full nonmembership, and the crossover point (see Ragin, 2008)
- Always examine carefully the fuzzy scores that result from any procedure you use to calibrate membership scores. Make sure that the scores make sense at the case level, based on your substantive and theoretical knowledge
10. Specific to fsQCA (Textbook, pp.87-121)
- It is crucially important to use theoretical and substantive (empirical) knowledge, rather than mechanical criteria, to calibrate degree of membership in sets; assigning fuzzy membership scores is interpretive and involves both theoretical knowledge and case-oriented research, based on available data
- Researchers should develop an explicit rationale for their specifications of full membership (1), full nonmembership (0), and the crossover point (0.5)
- If converting interval or ratio-scale data to fuzzy sets, use the calibration procedure that is built into the software (see Ragin, 2008).
- When examining the truth table spreadsheet showing consistency scores (as in Tables 5.8 and 5.9), remember that instances of the outcome may be included in rows with low consistency; treat these as contradictory configurations and use the procedures for resolving them, as presented in this book
- If you explicitly hypothesize necessary conditions, test for them before conducting truth table analysis; set a high consistency threshold for necessary conditions and eliminate any condition that is found to be necessary from the truth table analysis (i.e., address such conditions separately, as necessary conditions)
- When selecting frequency thresholds, take into account not only total number of cases, but also the nature and quality of the evidence; generally, the larger the total N, the higher the frequency threshold
- When selecting consistency thresholds, choose a threshold as close to 1.0 as is feasible, given the nature of the data; look for gaps in the distribution of consistency scores; avoid using a threshold below 0.75
- Derive all three solutions in each analysis—“complex “(no logical remainders used), “parsimonious” (logical remainders used without evaluating their plausibility), and “intermediate” (logical remainders restricted to those that are most plausible)
11. Technical Arbitrations and Practical Steps throughout the QCA Procedures (Textbook, pp.123-138)
This selective review of csQCA applications allows us to identify a series of additional, more transversal technical good practices (also applicable to mvQCA and fsQCA):
- For each technical arbitration (case selection, threshold setting, inclusion of logical remainders, etc.), always justify your choice and make choices transparent
- Likewise, it might be useful to conduct a sensitivity analysis by rerunning the analysis with different technical arbitrations
- Don’t be afraid to alter some of your initial arbitrations throughout the process of your research. QCA techniques are best used in an iterative manner
- In many situations, there is not a single “one size fits all” strategy to be applied. Problem-solving strategies are often best used in combination
- Logical remainders (and the resulting simplifying assumptions) are not to be used mechanically—the theoretical implications of including them have to be seriously considered
- If contradictory simplifying assumptions are produced through the respective minimizations of the [1] and [0] outcome configurations, they have to be identified and addressed
12. Transparency (Textbook, pp.167-168)
For all QCA techniques, the buzzword is transparency. Even in short-publication format (e.g., conference papers and journal articles), the following elements should be provided in some form:
- The raw data table
- The operationalization (dichotomization, trichotomization, or fuzzy-set calibration) of all variables (conditions and outcome)
- The computer software used (TOSMANA or FSQCA, or other available program). The minimization should not be performed by hand
- The truth table
- The analysis of necessary conditions
- The treatment of contradictory configurations (if any)
- The main iterations leading to the final (contradiction-free) model
- The way logical remainders are being used (if applicable)
- The full minimal formulas, not only as narratives, but also in formal notation. If there are many possible minimal formulas, all should be mentioned— or at least, the choice of a specific minimal formula should be well- documented and justified
- The minimal formulas before and after you factor them by hand (if applicable)
- The consistency and coverage measures
- The interpretation of the minimal formulas (which “paths” are more important and why? etc.)
Of course, in short publication format, it might be difficult to find enough room to lay out all these elements. Experience indicates that it can nevertheless be done, in a synthetic way (some good examples: Redding & Viterna, 1999; Vanderborght & Yamasaki, 2004; Hagan & Hansford-Bowles, 2005; Kilburn, 2004; Osa & Corduneanu-Huci, 2003; Chan, 2003). It is also always possible to make available (e.g., on a Web page) some elements that would be too cumbersome for a short publication (e.g., a raw data table that would be too large, qualitative threshold justification for some conditions, a long list of minimal formulas)
13. Words Matter, So Use the Correct Terminology! (Textbook, pp.181-184)
It is crucial to use the correct QCA terminology when writing up a report, publication, etc., in order to:
- Avoid confusing the reader, especially if he or she has been mostly trained in different methods and approaches
- Reinforce the notion that QCA techniques are underpinned by a specific paradigm, with its specific goals, assumptions, and conception of causality (e.g., “conditions” are not “independent variables,” etc.)
- Avoid being criticized on invalid grounds (e.g., a “minimal formula” is not a “general trend,” which could be statistically inferred from a sample to a whole population, etc.)
- Be fully understood in your demonstration
It might be useful, if space allows (in footnotes, for instance), to provide short definitions of the key QCA technical terms you are using. It is also advised to clearly mention the technique(s) you are using (csQCA, mvQCA, fsQCA, fuzzy sets, etc.) in your abstract
