Psychometric Properties of the Areas of Worklife Survey in an Industrial Context in Thailand
Article information
Abstract
Objectives:
The Areas of Worklife Survey (AWS) is widely used to assess organizational factors contributing to burnout. However, evidence regarding its construct and criterion validity has been reported primarily in human service settings. This study evaluated the construct validity of the AWS measurement model and the criterion validity of the AWS–burnout relationship among industrial workers in Thailand.
Methods:
In this cross-sectional study, a Thai-language electronic questionnaire was administered to 446 industrial workers between June 2024 and August 2024. Of these, 390 participants (87.4%) completed both the AWS and the Maslach Burnout Inventory–General Survey. Data were analyzed using confirmatory factor analysis within a structural equation modeling framework.
Results:
A modified 6-factor (chi-square/degree of freedom [χ²/df], comparative fit index [CFI], Tucker– Lewis index [TLI], root mean square error of approximation [RMSEA], and standardized root mean square residual [SRMR]) AWS model—excluding 5 items and allowing 2 correlated error terms—demonstrated satisfactory fit (χ²(213)=436.02, p<0.001; χ²/df=2.05; CFI=0.94; TLI=0.92; RMSEA=0.053; SRMR=0.053). Convergent validity (composite reliability=0.74–0.87; average variance extracted [AVE]=0.49–0.58) and discriminant validity were acceptable for most dimensions; however, the Fairness dimension (AVE=0.36) and the Reward–Fairness correlation remained problematic. The partial mediation model demonstrated acceptable criterion validity, with all mediation paths—except reward to values—reaching statistical significance.
Conclusions:
The AWS is a valid measure for assessing factors contributing to burnout among Thai industrial workers. Nevertheless, further refinement is necessary to ensure strong dimension-specific validity with minimal modification.
INTRODUCTION
Burnout represents a major global public health concern, with prevalence estimates ranging from 5% to 63% across the general working population in different industries [1,2]. It is defined as a syndrome of chronic, unmanaged occupational stress, characterized by emotional exhaustion, depersonalization (or cynicism), and reduced professional efficacy [3]. The consequences are both individual—encompassing psychological disorders that may progress to long-term physical illness or premature mortality—and organizational, manifesting as absenteeism, turnover, and decreased performance [4]. Although burnout was initially studied in healthcare contexts, it is now recognized across a wide range of occupational and professional groups [5]. While its antecedents extend across organizational, interpersonal, and individual domains, workplace culture and climate are increasingly regarded as primary drivers. Identifying and monitoring organizational risk factors is therefore essential for long-term, effective interventions that go beyond individual-level strategies [6,7]. Multiple variables contribute to burnout, with particular emphasis on the areas of worklife. This framework explains how organizational determinants precipitate burnout by examining the interaction between workers’ expectations or perceptions and their occupational environment [8].
The Areas of Worklife Survey (AWS), developed by Leiter and Maslach [9], consists of 28 items—originally 29—that measure employees’ perceptions of workload, control, reward, community, fairness, and values. Psychometric evaluations of the AWS, including reliability and both construct and criterion validity, have been extensively conducted in North America, Europe, Latin America, and Asia [10-21]. Beyond North America and Europe, validation has been limited and, except for Japan, has largely been restricted to human service settings (e.g., healthcare professionals, teachers, public service workers), where client interaction increases vulnerability to chronic burnout.
The limited evidence base constrains the applicability of these findings to other occupational groups. Despite acceptable overall validity, heterogeneous results in both construct (the 6-factor measurement model and item–dimension relationships) and criterion (structural mediation) analyses suggest potential cultural or methodological influences—or simple random variation [19]. Within Thailand’s industrial workforce, hierarchical norms may affect how issues such as workload or fairness are reported, and cultural expectations may shape acknowledgment of value incongruence. These influences could partly explain the inconsistent construct validity observed, emphasizing the importance of cross-cultural AWS validation [22]. Recent studies have highlighted high burnout prevalence among industrial workers: 86.0% among factory workers and miners in China, and 69.3% in Bulgarian industry—findings that underscore the urgency of addressing burnout. However, organizational risk factors in industrial contexts often differ from those in human service sectors [23,24], and while the AWS is frequently used in assessments, it has not undergone formal validation in industry. Thus, confirmatory factor analysis (CFA) of the AWS in diverse industries and worker populations is imperative.
Accordingly, this study aimed to apply CFA to evaluate the construct and criterion validity of the AWS among industrial workers in Thailand, in line with Leiter and Maslach [9]’s hypothesized frameworks.
METHODS
Study Design and Data Collection
This cross-sectional study was conducted among industrial workers in the ethanol and sugar business unit of a company operating across most of Thailand between June 2024 and August 2024. A priori sample size calculation for structural equation modeling (SEM)—assuming a small to medium effect size of 0.2, statistical power of 0.8, nine latent variables, 45 observed variables, and α=0.05—yielded a recommended sample size of 460 [25].
To ensure equal opportunity for participation across the business unit, all 446 eligible workers were invited through a full recruitment strategy. A Thai-language electronic questionnaire was distributed via the organization’s document management system, with the human resources department issuing 2 monthly reminders to all workers through the same internal system. The study achieved an 87.4% response rate (390 of 446 workers), which reduced statistical power from 0.8 to 0.7. The 56 non-participants either did not respond to the invitation or declined participation.
The questionnaire contained 3 sections. Section 1 collected demographic data, including sex, age, work duration, job position, job type, and shift status. Section 2 employed the 28-item AWS to evaluate 6 dimensions—workload (6 items), control (3), reward (3), community (5), fairness (6), and values (5). Each item was rated on a 5-point Likert scale (1–5), with reverse scoring applied according to the manual [8]. Reliability was confirmed with Cronbach’s α=0.7. Section 3 applied the Maslach Burnout Inventory–General Survey (MBI–GS), a 16-item instrument covering 3 dimensions: emotional exhaustion (EE, 5 items), cynicism (CY, 5 items), and professional efficacy (PE, 6 items). Items were rated on a 7-point Likert scale (0=never to 6=every day) [26], with aggregated dimension scores ranging from 0–30 (EE and CY) and 0–36 (PE). High EE and CY scores combined with low PE scores indicate greater burnout. Subscale reliability was high, with Cronbach’s α=0.9 (EE), 0.9 (CY), and 0.8 (PE). Both Sections 2 and 3 were licensed translations, subjected to a forward–backward translation protocol and reviewed for content validity by experts in organizational psychology, psychiatry, and occupational medicine. Instrument fidelity was verified in a May 2024 pilot involving 40 workers in a comparable industry context.
Statistical Analysis
The statistical analysis included descriptive analysis, assessment of the measurement model (construct validity), and assessment of the structural model (criterion validity).
Descriptive analysis
The first step was to identify and exclude participants with potentially biased responses. Inattentive responses to AWS dimensions were evaluated using the Maximum LongString technique [27], which identifies the longest sequence of uniform responses. The cutoff threshold was set at the third quartile plus 1.5 interquartile range, supported by inspection of the response pattern histogram. The retained sample was then summarized using frequencies and percentages for categorical variables and means (standard deviations, SDs) for continuous variables.
Assessment of the AWS measurement model
AWS item responses were summarized as means (SDs), and normality was assessed through kurtosis (k<7) [28]. CFA within SEM was conducted to evaluate item–dimension relationships and confirm the 6-factor AWS model. Given adequate normality, maximum likelihood estimation was used to assess model fit, with chi-square/degrees of freedom (χ²/df), comparative fit index (CFI), Tucker–Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR) as indices [29]. A good fit was defined as χ²/df <2.5, CFI and TLI ≥0.90, and RMSEA and SRMR ≤0.05 [30]. To reduce sampling error, 90% percentile confidence intervals (CIs) were generated using bootstrapping (n=1000) [31].
Modeling began by comparing the hypothesized 6-factor AWS structure against lower-order alternatives described by Leiter and Maslach [10]. When poor fit was identified, iterative modifications were applied: items with loadings below |0.40| were removed, correlated error terms within the same factor were freed if modification indices (MI)>20, and items with standardized residual covariances above |2| were deleted [32]. To prevent arbitrary changes, error terms were correlated only within the same factor. Item removal was limited to a maximum of 20% (5 of 28 items), with at least 3 items retained per factor [33].
Construct reliability and convergent validity, as well as discriminant validity, were assessed using composite reliability (CR) >0.7, average variance extracted (AVE) >0.5, and inter-construct correlation coefficients <0.7 [34]. CR was considered inadequate if the upper bound of the 90% CI fell below 0.7. Convergent validity was established if CR ≥0.7, the upper 90% CIs of all loadings were ≥0.5, and the upper 90% CIs of AVE were ≥0.5. Discriminant validity required evidence of convergent validity, absence of cross-loadings, AVE exceeding shared variance (AVE–SV), and the lower bound of 90% CIs of inter-construct correlations <0.85.
Although guidelines suggest assessing construct validity only after establishing overall model fit, and discriminant validity only once convergent validity is confirmed, both were evaluated for the original AWS model to provide a baseline for comparison with modified models and to identify factors influencing validity [31].
Assessment of the structural model for the AWS and MBI-GS relationship
The relationship between the 6 AWS dimensions and the 3 burnout components was assessed via SEM, treating them as latent constructs [10]. Following established procedures, only the 3 items per construct with the highest loadings and lowest error covariances were retained, yielding robust correlations with the full scales (r=0.85–0.97). The Bollen–Stine bootstrap method was applied to address small-sample effects under a multinomial distribution, and bootstrap standardized estimates with 90% CIs were reported [17]. Data analyses were performed using SPSS version 28.0 (IBM Corp., Armonk, NY, USA), and Microsoft Excel with the Stats Tools Package add-in (Microsoft Corp., Redmond, WA, USA).
Ethics Statement
This study was approved by the Institutional Review Board of the Faculty of Medicine, Chulalongkorn University (Med Chula IRB No. 0005/67). Following ethics committee approval, all participants provided written informed consent after receiving documents outlining the study’s objectives, methodology, data collection procedures, and confidentiality assurances.
RESULTS
In the assessment of potentially biased responses, the median (quartiles 1 and 3) of identical response sequences was 5 (4 and 6), with a cut-off value of 9. Twenty-three cases (5.9%) were identified as potentially biased and excluded, leaving 367 participants for analysis. Among these, 274 (74.7%) were male. The mean age was 37.4±6.9 years, and mean tenure was 10.9±5.9 years. The sample included 317 (86.4%) industrial workers and 50 (13.6%) managers or supervisors, with 204 (55.6%) on day shifts and 163 (44.4%) on shift work.
Measurement Model (Construct Validity)
Table 1 summarizes AWS item means. The highest scores (>4.0) were observed for AWS28 (quality within the organization), AWS16 (cooperation within a group), and AWS6 (control over one’s tasks), while the lowest scores (<2.5) were recorded for AWS1 (time pressure), AWS18 (group closeness), and AWS13 (lack of recognition). All item kurtosis values were <7. CFA supported the hypothesized 6-factor structure (χ²(335)=868.44, p<0.001; χ²/df=2.59; CFI=0.87; TLI=0.85; RMSEA=0.066; SRMR=0.055), which outperformed both the unidimensional model (χ²/df=5.08; CFI=0.65; TLI=0.62; RMSEA=0.106; SRMR=0.83) and the 2-factor model (χ²/df=4.47; CFI=0.70; TLI=0.68; RMSEA=0.097; SRMR=0.804). Nevertheless, absolute fit remained suboptimal [10].
Model modifications proceeded iteratively. Items with loadings <|0.40| (Workload5, Fairness2) were removed. Within-factor error covariances (Reward2/Reward3, Community1/Community2, Fairness5/Fairness6) were freed when MI >20. Subsequently, items Workload4, Control1, and Reward3 were removed due to standardized residuals >|2|, which also eliminated the Reward2/3 covariance. The final model demonstrated robust fit, with all loadings >0.4 (range=0.43–0.91), and none significantly <0.5, though RMSEA and SRMR remained marginally above optimal levels.
CFA of a parsimonious AWS short scale in industrial workers comprises 18 items—items 1–3 (Workload), 7–9 (Control), 10, 11, 13 (Reward), 14, 16, 17 (Community), 19, 21, 22 (Fairness) and 25, 27, 28 (Values)—with robust standardized loadings (0.50–0.91). Model fit was satisfactory across indices, apart from RMSEA slightly exceeding its acceptable threshold (Table 2).
Construct validity metrics showed marked improvement compared with the original model. Internal consistencies were strong (CR=0.73–0.87). Convergent validity was insufficient for Fairness (AVE=0.36<0.50, p<0.05), whereas Reward and Values achieved the required threshold (AVE=0.49; 90% CIs≥0.50). For dimensions with adequate convergent validity, discriminant validity was supported in all but Reward, whose square root of AVE (0.70) fell short of its shared variance with Fairness (0.75). Community demonstrated the strongest construct validity, followed by Workload, across all validity measures.
The AWS short form improved convergent validity relative to both the original and modified models. Discriminant validity deficiencies persisted for Reward and Values against Fairness, as their variances (0.70 and 0.71, respectively) were lower than their shared variances with Fairness (0.79 and 0.77, respectively) (Table 3).
Structural Model (Criterion Validity)
Table 4 presents pairwise correlations between AWS short form dimensions and MBI–GS burnout components. These were consistent with theoretical expectations, encompassing intra-AWS associations, AWS–burnout relationships, and inter-burnout component correlations.
Pearson correlation coefficients for the AWS dimensions and burnout components based on the short models of AWS and MBI-GS
Figure 1 displays the structural equation model of the hypothesized AWS–MBI framework [9], with 2 correlated error covariances for the Control construct freed. The model achieved acceptable fit (χ²(309)=552.64, p<0.001; χ²/df=1.79; CFI=0.95; TLI=0.95; RMSEA=0.046; SRMR=0.047). All structural paths were statistically significant except the path from Reward to Values (p=0.089). Modification indices did not suggest any further parsimonious structural improvements. However, adding a direct path from Control to Cynicism produced a significant reduction in χ² (Δχ²=–11.437, df=1, p<0.01) without materially altering other fit indices, yielding a standardized coefficient of –0.231 (p=0.002). This adjustment rendered the indirect mediation effect of Values on Cynicism non-significant (β=–0.049, p=0.473). No additional direct mediation paths were significant.
DISCUSSION
This study aimed to assess the validity of the AWS beyond human service contexts and found the original measurement model deficient when applied to Thai industrial workers. To improve model fit and strengthen construct validity, 5 items were removed (Workload4, Workload5, Control1, Fairness5, and Fairness6), and 2 intra-factor error covariances were freed (Reward2/Reward3 and Community1/Community2).
Although convergent and discriminant validity were satisfactory for most dimensions, the Fairness dimension’s convergent validity and the Reward–Fairness discriminant validity remained inadequate. The hypothesized structural model demonstrated strong criterion validity, with all mediation pathways except Reward to Values significant and no parsimonious revisions indicated. However, introducing a direct path from Control to Cynicism eliminated the indirect effect of Values on Cynicism.
Post hoc principal component analysis (PCA) was conducted to clarify the modified AWS model’s validation outcomes. First, the Fairness dimension’s weak convergent validity arose from only 2 of its 5 items loading adequately on the construct, with AWS21 (Fairness3) showing a loading >|0.50|. Second, Workload’s convergent validity improved after removing AWS4 (Workload4) and AWS5 (Workload5), as AWS4 cross-loaded on Fairness (|loading| >0.40), while AWS5 displayed anomalous correlations (r=–0.127 to –0.216) and aligned more closely with Values than Workload. The remaining Workload items exhibited strong loadings (0.584–0.801). Importantly, no items from other dimensions intruded on Workload, supporting its discriminant validity. Third, Community demonstrated satisfactory convergent and discriminant validity, with all items loading appropriately, except AWS18 (Community5), which cross-loaded onto Reward with loadings <0.40. Fourth, Control’s convergent validity improved after removing AWS6 (Control1), as this item loaded primarily on Workload. Fifth, both Control and Values showed robust convergent and discriminant validity, with item loadings above 0.50 (Control: 0.664–0.795; Values: 0.584–0.793). Sixth, Reward’s discriminant validity was undermined by anomalous cross-loadings of Fairness items (AWS22/Fairness4; AWS23/Fairness5; AWS24/Fairness6; AWS19/Fairness1) onto Reward with loadings >0.40 (Supplemental Material 1).
The presence of improper factor-loading patterns in AWS items—and the resulting need for extensive model modification with weakened validity—has been documented previously. In North American and European PCAs, loadings generally aligned with the hypothesized 6-factor structure, with only minor deviations. By contrast, PCAs conducted outside these regions revealed marked inconsistencies. Apart from frequent cross-loadings in 1 Spanish study [15], non-loadings and misallocated loadings were more common in Latin American and Asian contexts. Among deviations observed both in our PCA and in Latin American/Asian studies, the “free time disconnection” item (AWS6/AWS5) often failed to load on its designated factor or was reassigned to an unanticipated domain (Japanese [20]; Vietnamese [21]). Reverse-coded items also failed to cluster with their dimension-specific counterparts in Peruvian [18] and Japanese validations. Cross-loadings between Reward and Fairness items, as observed in our study, were similarly reported in Peruvian and Japanese contexts (Supplemental Material 2).
These pronounced deviations were associated with more extensive AWS model modifications in Latin American and Asian contexts (up to 7 items removed) than North American and European contexts, where deletions ranged from 0 to 2 items (Supplemental Material 3). The Peruvian study incorporated a secondary method factor without deletions, while the Japanese study deleted 2 items and reassigned Reward3/Reward4 to Fairness. Nevertheless, neither approach achieved acceptable model fit, and their impacts on validity remain unclear.
Discrepancies in AWS validation between North American/European and Latin American/Asian contexts likely reflect combined influences of item composition, semantic divergence, cultural variation, item polarity, and biases such as acquiescence or careless responding. First, social science instruments like the AWS are prone to cross-loadings among closely related constructs (e.g., workload–control; reward–fairness [35]). Second, because of the significant structural differences between English and Thai, our adaptation emphasized conceptual rather than linguistic equivalence to preserve construct integrity. However, loss of implicit meaning may have obscured intended nuances [36]. Such factors may have blurred otherwise distinct AWS constructs, creating interdependence where responses to items in 1 dimension influenced responses in another. For example, higher responses to Workload items 1–4 were negatively correlated with responses to Control1 (r²=0.254–0.350), causing Control1 to load on Workload rather than Control. Similarly, responses to Reward1 and 2 may have been shaped by Control1–4, while Fairness1 and 4–6 appeared influenced by Reward patterns. Third, AWS item Workload5 (“I leave my work behind when I go home at the end of the workday”) may lack validity in Asian contexts, where cultural norms emphasize duty and acceptance of extended working hours, in contrast to rights-based Western paradigms. Fourth, mixed-worded scales can weaken internal consistency and disrupt dimensionality [37]. Whereas Western respondents typically interpret positive and reverse-coded items as opposites, non-Western respondents may misinterpret them, producing anomalous loading patterns and weak loadings for reversed Fairness items (AWS5 and 6; r²=0.033–0.246). This finding underscores the need to reconsider mixed-word formats in cross-cultural validations. Fifth, acquiescence bias—a tendency to agree regardless of item content—is widespread in Asian settings and may have reduced both reliability and dimensional clarity. This is reflected in the higher AWS dimension scores among Asian samples compared with North American/European ones [14]. Although careless responding could yield similar effects, it was mitigated in this study through the Maximum LongString technique [27]. The recurring nature of these issues across non-Western cohorts suggests they stem from cultural response styles rather than invalidating the AWS for Thai industrial workers. These challenges could be mitigated by strengthening forward–backward translation protocols, adopting interrogative formats for mixed-word AWS items, and allowing secondary or cross-loadings in factor analysis [38], given the limited utility of method factors [18,19]. Further research is needed to determine the most effective strategy.
Regarding the criterion validity of the AWS–Burnout partial mediation model, our findings were broadly consistent with those of Leiter and Maslach [9]’s original report and most subsequent studies (Supplemental Material 4). A notable exception was the non-significant pathway from Reward to Values (p=0.089), which parallels findings in Canadian and German analyses. In contrast, the Vietnamese study produced divergent results, with some hypothesized pathways absent and additional, unanticipated pathways reaching significance.
The findings of this study yield several recommendations. The AWS is a valid instrument for assessing work-life factors associated with burnout in the Thai industrial context, with the modified 6-factor model demonstrating satisfactory fit and supporting its construct validity. Strengthening the psychometric properties of the AWS is critical for accurately identifying burnout drivers, thereby enabling more targeted interventions and more efficient allocation of resources beyond generic well-being initiatives. A validated AWS can thus support more effective strategies to mitigate burnout and enhance organizational health [39]. However, further refinement—particularly of the Fairness dimension—is needed to ensure robust validity with minimal modification. Strengthening both construct and criterion validity would increase the AWS’s value as a tool for workplace health surveillance, facilitating earlier detection of burnout risks, more focused organizational adjustments, and reliable evaluation of intervention outcomes.
This study has limitations. Its cross-sectional design restricts criterion validity and precludes causal inference regarding burnout. Nonetheless, Leiter and Maslach [10]’s 3-year longitudinal study demonstrated strong temporal correlations between AWS dimensions and burnout components, largely consistent with cross-sectional findings. Because the questionnaire relied on self-reported data, response bias is a potential concern. Selection bias—particularly the healthy worker effect—may also have influenced results, as individuals most severely affected by burnout or adverse working conditions may have already left their jobs [40]. Generalizability is further limited by the single-company sample of industrial workers, making it difficult to extend the findings to other occupational settings. Despite these limitations, the study has notable strengths, including its comprehensive management of uncertainty across all analytical phases, which mitigates inferential error [31], and its dual reporting of convergent and discriminant validity for each AWS dimension, providing a clearer and more precise depiction of construct validity than earlier studies.
In conclusion, the AWS is a valid tool for assessing organizational factors contributing to burnout among Thai industrial workers. Nonetheless, further refinement is required to achieve strong dimension-specific validity with minimal modification.
Supplemental Materials
Notes
Conflict of Interest
The authors have no conflicts of interest associated with the material presented in this paper.
Funding
This study was supported by the Ratchadaphiseksomphot Fund by the Faculty of Medicine, Chulalongkorn University (GA67/077).
Acknowledgements
The authors thank the executive and human resources department of an ethanol and sugar business unit of a company located in 4 regions in Thailand for their support in coordinating data collection. We also thank all participants, the industrial workers, for their time and cooperation.
Author Contributions
Conceptualization: Ratanachina J, Hongsiri I, Chuthong W, Jiamjarasrangsi W. Data curation: Ratanachina J, Hongsiri I, Chuthong W. Formal analysis: Ratanachina J, Jiamjarasrangsi W. Funding acquisition: Jiamjarasrangsi W. Methodology: Ratanachina J, Jiamjarasrangsi W. Project administration: Ratanachina J, Hongsiri I, Chuthong W. Writing – original draft: Ratanachina J, Jiamjarasrangsi W. Writing – review & editing: Ratanachina J, Hongsiri I, Chuthong W, Jiamjarasrangsi W.
