### Research methods

#### DEA-BCC model

Data envelopment analysis is a linear programming method based on the measurement of the efficiency frontier under the input–output comparison of multiple decision-making units. The DEA-BCC model breaks the assumption of constant payoff of scale in the Charnes–Cooper–Rhodes model; it further decomposes the static combined efficiency of the decision-making units under variable payoff of scale into pure technical efficiency and development level efficiency^{32} and uses the output-oriented BBC model to measure tourism efficiency and analyze the current situation of tourism factor utilization.

#### Malmquist productivity index model

The DEA-based Malmquist productivity index decomposes total factor productivity of tourism to reflect the temporal trends of tourism efficiency and the main factors leading to the generation of changes. It is expressed as^{33}:

$$M_{0} (x_{t} ,y_{t} ,x_{t + 1} ,y_{t + 1} ) = \sqrt {\frac{{D_{0}^{t} (x_{t + 1,} y_{t + 1} )}}{{D_{0}^{t} (x_{t,} y_{t} )}} \times \frac{{D_{0}^{t + 1} (x_{t + 1,} y_{t + 1} )}}{{D_{0}^{t + 1} (x_{t,} y_{t} )}}}$$

(1)

$$M_{0} (x_{t} ,y_{t} ,x_{t + 1} ,y_{t + 1} ) = \frac{{D_{0}^{t + 1} (x_{t + 1,} y_{t + 1} )}}{{D_{0}^{t + 1} (x_{t,} y_{t} )}} \times \sqrt {\frac{{D_{0}^{t} (x_{t + 1,} y_{t + 1} )}}{{D_{0}^{t + 1} (x_{t + 1,} y_{t + 1} )}} \times \frac{{D_{0}^{t} (x_{t,} y_{t} )}}{{D_{0}^{t + 1} (x_{t,} y_{t} )}}}$$

(2)

where xi and xt + 1 are the input vectors of t and t + 1 respectively; yi and yt + 1 are the output vectors of t period and t + 1 period respectively; \(D_{0}^{t} (x_{t} ,y_{t} )\) and \(D_{0}^{t} (x_{t + 1} ,y_{t + 1} )\) are the distance functions of the decision-making units of the period t and the period t + 1 with the reference to the technological frontier of the period t; \(D_{0}^{t + 1} (x_{t} ,y_{t} )\) and \(D_{0}^{t + 1} (x_{t + 1} ,y_{t + 1} )\) are the distance functions of the decision-making units of the period t and the period t + 1 with the reference to the technological frontier of the period t + 1; \(M_{0} (x_{t} ,y_{t} ,x_{t + 1} ,y_{t + 1} )\) refers to the total factor productivity index (TFPCH). A value greater than 1 implies that the total factor productivity is increased. A value less than 1 implies that the total factor productivity is decreased. A value equal to 1 indicates that the total factor productivity is unchanged. The first item on the right side of Eq. (2) represents the technical efficiency change (EFFCH) from t to t + 1, and the second item represents the technical progress change (TECH).

Among them, the change of technical efficiency can be divided into the change of scale efficiency (SECH) and the change of pure technical efficiency (PECH). Therefore, formula (1) can be further decomposed into:

$$M_{0} (x_{t} ,y_{t} ,x_{t + 1} ,y_{t + 1} ) = \frac{{S_{0}^{t} (x_{t} ,y_{t} )}}{{S_{0}^{t + 1} (x_{t + 1} ,y_{t + 1} )}} \times \frac{{D_{0}^{t} (x_{t + 1} ,y_{t + 1} /VRS)}}{{D_{0}^{t} (x_{t} ,y_{t} /VRS)}} \times \sqrt {\frac{{D_{0}^{t} (x_{t + 1} ,y_{t + 1} )}}{{D_{0}^{t + 1} (x_{t + 1} ,y_{t + 1} )}} \times \frac{{D_{0}^{t} (x_{t} ,y_{t} )}}{{D_{0}^{t + 1} (x_{t} ,y_{t} )}}}$$

(3)

where VRS represents the change of return to scale; CRS indicates that the return to scale remains unchanged;\(S_{0}^{t} (x_{t} ,y_{t} )\) is the scale function of the period t with the technology frontier of the period t as the reference; \(S_{0}^{t} (x_{t + 1} ,y_{t + 1} )\) is the scale function of the t + 1 period with the technology frontier of the t + 1 period as the reference. The first item on the right side of the equation represents the change in scale efficiency (SECH) from t to t + 1, and the second item represents the change in pure technical efficiency (PECH).

#### Coupling coordination degree model

Coupling degree is an index to quantitatively measure the degree of mutual influence and interaction between two or more systems. The coupling degree model of tourism efficiency and development level is constructed by referring to relevant research results combined with the actual research using the following formula^{34}:

$$C = \left\{ {f(x)g(y)/[(f(x) + g(y))/2]^{2} } \right.^{k}$$

(4)

where C is the coupling degree of tourism efficiency and development level, 0 < C < 1. A larger C value corresponds to better coupling; f(x) and g(x) are the tourism efficiency index and tourism development level index, respectively; k is the adjustment coefficient (it is generally 2 ≤ k ≤ 5). The k value in this paper was taken as 2 since the coupling degree model consisted of two subsystems.

The coupling coordination model was used to further explore the excellence of coupling between tourism efficiency and development level, together with the consistency characteristics of the synergistic effect, and their overall efficacies. Its calculation is based on the following formula^{35}:

$$D = \sqrt {C \times T} ,\;T = {\upalpha }f(x) + {\upbeta }g(y)$$

(5)

where D is the coupling coordination degree of tourism efficiency and tourism development level; T is the comprehensive coordination index of both; α and β are coefficients to be determined, and α + β = 1. On the basis of previous studies^{4}, this article believes that the two subsystems of tourism efficiency and tourism development level are equally important; Therefore, let α = β = 0.5.

### Indicator Selection and Data Sources

Tourism efficiency mainly depends on input and output indicators. Input indicators involving the most basic factors of production in classical economics mainly include land, labor, and capital^{36}. Due to the difficulty in obtaining provincial tourism land data, most relevant studies have not included it in the input variable indicators^{37}. Tourism employees are the most direct providers of tourism services, and their numbers are the most ideal measure of the labor factor. However, affected by the comprehensive characteristics of the industry, most provinces lack statistics on this indicator. Therefore, the labor factor indicator is replaced with the number of employees in the tertiary industry. This indicator has strong data availability and almost covers all direct and indirect employment related to the tourism industry. This amplifies the scale of the input of the labor factor; however, it considers the comprehensive nature of the tourism industry to a certain extent. The capital factor is an important support for tourism activities, and most provinces lack official statistics on fixed investment in tourism; therefore, the number of 3A (or three-star) grade and above tourist attractions (points), star-rated hotels, and travel agencies reflecting the status of tourism resources and tourism services are used as alternative input indicators for the capital element of tourism. Meanwhile, the total number of tourist arrivals and total tourism revenue are selected as the primary indicators of the direct output of tourism activities. Herein, total tourism headcount and total tourism revenue indicators were selected to construct the tourism development level measurement model to maintain data consistency and comparability of the results. The tourism development level measurement model is^{18}:

$$SP_{n} = \sum\limits_{i = 1}^{n} {P_{i} S_{i} }$$

(6)

where Pi is the weight calculated by applying the entropy value method and Si is the dimensionless value of indicator i. Meanwhile, the total tourism revenue is deflated by using the Consumer Price Index (CPI) of each year as the base period in 2000 to eliminate the influence of price fluctuations. Simultaneously, the total tourism income was deflated using the consumer price index of each year compared with the base period in 2000 to eliminate the influence of price fluctuations.

The administrative boundary vector data of 31 provincial units in China was extracted based on the 1:7,000,000 China administrative division map of the National Bureau of Surveying, Mapping, and Geographic Information (Fig. 1). The data in this study were primarily obtained from the China Regional Economic Statistical Yearbook 2000–2020 and the 2000–2020 provincial (city) statistical yearbooks, tourism yearbooks, and national economic and social development statistical bulletins. Some missing index data were calculated and supplemented using index smoothing.