Applying CHAID to Identify the Accounting-Financial Characteristics of the Most Profitable Real Estate Companies in Spain

The aim of this study is the determination, from an empirical perspective, of the accounting and financial features which could condition financial profitability of real estate companies, to identify the performances that guarantee its permanency in the current marketplace, characterized by the world economic crisis, specially in Spain, whose housing sector represents an important contributor to the economic growth. Although at a theoretical level the DuPont Model establishes the relationships between a group of accounting ratios and financial profitability. This paper uses a sample of 5,484 Spanish real estate companies to quantify these relationships and to extract the most relevant ones and to obtain the patterns of the most profitable companies. We use ROE to measure profitability and we analyze various independent variables about solvency, liquidity, activity, turnover, financial equilibrium and investment structure. The main contribution is of methodological nature, as we have applied statistics tools that do not require initial hypotheses on the distribution of the variables, by using a data mining technique of classification and regression tree based on rule induction algorithms known as CHAID. The study provides quantitatively success profiles by means of a set of rules describing the patterns of the most profitable companies.


INTRODUCTION
In the current marketplace, which is characterized by the fi nancial crisis in the developed world, the consequences in Spain increased due to the crisis in the housing sector that had been an important contributor to Spain's economic growth. However, at present, it is yielding lower rates of profi tability. The GDP in Spain is expected to contract by 3.3% in 2009 and 0.6% in 2010, down from 1.2% growth in 2008. A modest recovery will only begin during the second half of 2010, although there is a possibility that this will be delayed. This economic contraction has an important infl uence in the housing sector, illustrated for example with the last information published by the Bank of Spain that points out that the real estate assets of Spanish banks and saving banks were rising, at the end of March 2009, up to 20.541 million Euros, 2% more than the previous month and 10% more than one year before. The aim of this study is to determine and evaluate, from an empirical perspective, the accounting and fi nancial features that could condition the fi nancial profi tability of companies in the real estate sector, identifying the performances that guarantee their permanency.
At a theoretical level, the DuPont Model establishes the relationships between fi nancial profi tability and a group of different variables and accounting ratios, such as asset turnover, sales margin or fi nancial leverage. Firstly, the objective of this research is to perform an empirical contrast of this model by analyzing the relationships between the profi tability and the accounting ratios, and extracting the most relevant explanatory variables of the profi tability. Secondly, the paper aims at quantifying those relationships and their explanatory variables with the purpose of obtaining the patterns or profi les -that is to say, the combinations of economicaccounting features-of the most profi table companies in the housing sector.
The sample includes 5,484 Spanish real estate companies. Return on Equity (ROE) is used to measure the profi tability as dependent variable. As explanatory variables the study uses various independent variables related to activity, turnover, fi nancial equilibrium and investment structure, solvency and liquidity, most of them defi ned in the DuPont Model.
This work begins with a review of the main empirical studies, which have analyzed the relationships between fi nancial profi tability and different accounting ratios. Next, we outline our methodological proposal to achieve the aims. To that effect, we illustrate the DuPont Model that is used as a reference, describing the sample and the variables, and fi nally explaining the analysis technique that is applied. Subsequently, the main results of the analysis are exhibited: in the fi rst place by means of a descriptive and exploratory analysis, and later from an explanatory point of view. Finally, the paper illustrates the most relevant discussions on this research.

REVIEW OF EMPIRICAL EVIDENCE
The importance of the profi tability as an essential factor for the long-term survival of the companies has motivated the appearance of a high number of empirical works to evaluate the profi tability of the Spanish companies, particularly real estate fi rms, fundamentally from a descriptive point of view. The review of the empiric literature shows the existence of two research subjects: one with a descriptive character, the other with an explanatory nature.
In papers of research subjects of descriptive character, we can distinguish two groups: (a) those referring to the whole of the Spanish fi rms, and (b) those analyzing particular branches of the Spanish economy or related to a specifi c geographic area. Within the fi rst group, the following works stand up: Maroto (1993;, Rodríguez (1989), Bueno et al. (1990), Huergo (1992), Lucas & González (1993), Sánchez (1994), and Gonzalez & Correa (1999). Also, at an institutional level, several organizations issue reports, such as the Research Service of the Mayor Council of Chambers of Commerce of Spain. The council periodically publishes reports about the situation of Spanish companies, such as the study about the profi tability of the Spanish fi rms during the period 2000-2004 (Lizcano, 2004) or the fi nancial report for the year 2006 (www.camaras.org). In the same descriptive vein, albeit by sectors, some works have studied the profi tability of Spanish companies, such as fi rms in the automobile industry (Rodríguez, 2002). Specifi cally to the real estate sector, many associations of realtor fi rms, institutions 1 and banks publish annual reports on the evolution of the sector and its perspectives. Also, many authors, such as Bermudez (2008) and Ferruz (2007), have studied the situation and the main characteristics of the housing sector, with analysis of strengths and weaknesses. However, in general, all these descriptive studies use a traditional methodology, focused fundamentally on the analysis univariable of ratios, applying it on account information too much aggregated, which is obtained from the database of the Statement Central of the Spanish Central Bank. This information introduces problems of representativeness of the Spanish entrepreneurial environment, made explicit by the prevalence of big companies; this approach runs into trouble with the analyses and conclusions from those studies.
With regard to papers with explanatory nature, we have found various documents that make use of statistical techniques of multivariate analysis from an empirical perspective: Fariñas & Rodriguez (1986), Aguilar (1989), Antón, Cuadrado & Rodriguez (1990), Fernández & García (1991), and González (1997). The analysis of these works suggests that size has been the variable which has received a bigger attention from the researchers of profi tability. Nevertheless, it is not possible to establish a clear relationship between both variables, since the conclusions from those studies are heterogeneous. Thus, some papers indicate a positive relationship between size and profi tability (Galvé & Salas, 1993;González, 1997). However, other authors show the existence of a negative relationship, confi rming the results obtained through the traditional methodology, as proved by Huergo (1992), Fariñas (1992), Maroto (1993Maroto ( , 1998, Salas (1994) and Illueca (1996), who point out that small and medium companies get higher fi nancial and economic profi tability. On the other hand, the studies of Suárez (1977), Rodriguez (1989 and Galán (1997) suggest that size is not a significant variable to explain the profitability of companies.
The principal limitations of these explanatory studies are fundamentally consequence of three aspects: (a) the diffi culties in obtaining a signifi cant sample of companies that brings consistency to the results, mainly in the real estate sector; (b) the biggest complexity that implies the application of the multivariate statistic techniques and the interpretation of their results; (c) the absence of normality in the distributions of the ratios, which limits the validity of some statistic techniques and reduces the explanatory capacity. This research tries to contribute, by means of the empiric analysis, to improve the knowledge of the economic and fi nancial characteristics that determine the profi tability of the Spanish companies in the housing sector.
Our main contributions are of methodological nature. In the fi rst place, our study focuses on the real estate sector using a sample of companies. We try to get over the problems other works show by using disaggregated account information (for each fi rm), with an appropriate representativeness by size, and by jointly analyzing a suffi cient number of variables and ratios that can explain fi nancial profi tability. In the second place, the analysis applies statistics tools which do not require initial hypothesis on the distribution of the variables (showing greater adjustment to the characteristics of the account information). We have applied data mining techniques of classifi cation and regression tree based on rule induction algorithms such as CHAID.

METHODOLOGICAL PROPOSAL
The following section outlines the methodological scheme we propose to achieve the aims. In this section, we show the DuPont Model, which is used as a reference to verify them; we describe the sample and the variables, and fi nally we explain the analysis techniques that are applied.

Theoretical model: fi nancial profi tability
The study of the profi tability is usually carried out at two levels: economic profi tability and fi nancial profi tability; their relationship comes to be defi ned by the fi nancial leverage.
The Economic profi tability (ROA) is a measure of the capacity of the assets to generate worth with independence of how they have been fi nanced. It is usually obtained as follows 2 : ROA may be decomposed into return on sales multiplied by asset turnover: Return on sales represents the profi t obtained for each sold monetary unit, that is, the profi tability of the sales. The components of return on sales can be analyze through the decomposition into costs of goods sold, depreciation and cost of employees.
Asset turnover measures a fi rm's effi ciency at using its assets in generating sales. The amount of sales is generated for every monetary unit's worth of assets. It is calculated by dividing sales by total assets. b) Financial profi tability (ROE = Return on Equity) Financial profi tability (ROE) is a measure of a corporation's profi tability that reveals how much profi t a company generates with the money shareholders have invested. It is defi ned as: At a theoretical level, the DuPont Model was a method of performance measurement that was started by the DuPont Corporation in the 1920s. The analysis breaks fi nancial profi tability among various factors that represent the explanatory variables to contrast in this research:

ROA =
Earnings Before Interest and Taxes (EBIT)  • Taxes Effect, determined by dividing net profi t by the profi t and loss before taxes 3 : Taxes Effect ct = NP = P/L before Taxes P/L before Taxes -Taxes = 1 -t P/L before Taxes Alternatively, an expanded decomposition of fi nancial profi tability is shown as follows, the equation that is usually known as the Financial Leverage This formula 5 allows completing the explanatory variables extracted at the fi rst decomposition.

Variables
The DuPont Model shows the main theoretical variables that affect fi nancial profi tability. Additionally, other ratios and indicators that have traditionally been studied by the entrepreneurial analysis are added to those variables of the DuPont Model, completing the group of independent variables to be contrasted empirically in this research.
As a result, Return on Equity (ROE) is used to measure the fi nancial profi tability (dependent variable). As explanatory variables this study uses various independent variables related to different aspects of the entrepreneurial environment: asset structure, liability structure, fi nancial balance, profi tability and productivity, turnover and activity, and growth. The defi nitions of the variables used are described on Tables 1 and 2.
The dependent variable (ROE) has been categorized into quartiles (low, low medium, high medium, high), because our main interest is to focus on the fi rst and fourth quartiles. They represent the best and the worst profi tability situations (success and failure profi les). This categorization into quartiles is applied by many authors in studies that apply the CHAID technique, such as Santín (2006), Dills (2005) and Gonzalez, Correa and Acosta. (2002).

Sample characteristics
The data for this research is based entirely on the SABI database (Iberian Balance Sheet Analysis System), which is offered jointly by INFORMA D&B and Bureau Van Dijk. This database records the fi nancial statements (balance sheet and profi t and loss account) of companies in Spain and Portugal, provided by the Trade Registers of every geographical area.
INFORMA D&B was the fi rst European company to supply commercial and fi nancial information over the internet (15/9/96), and it was also the fi rst Spanish commercial and financial information database to achieve AENOR, at present updated to the standard ISO 9001. In particular, the SABI database shows general information and annual accounts for more than 1,2 million of Spanish companies as well as more than 350,000 Portuguese fi rms.
The website of INFORMA D&B include the approximate price list for an annual subscription to the database, ranging from € 6,000 to more than € 9,000, which depends on the number of companies that are available, the update frequency, the geographical scope and the type of access (DVD or network). Although INFORMA owns this information, there is no problem in using it for research or academic purposes. SABI allows making multi-criteria searches defi ning the variables that are required in each case; confi gures lists of companies, establishing personal formats, and creating particular ratios and add new fi nancial information in any given report. Through these tools, the information obtained was debugged to avoid errors and to allow the statistical analysis. The sample comprised 147,299 companies, and the housing sector included 5,484 fi rms. Companies with negative shareholders funds, bankruptcy or negative net assets, or incomplete information in some of the variables defi ned, such as not disaggregated data, were removed from the sample. Additionally, using the Clementine software (SPSS Inc.), companies with outliers or extreme values (more than 3 or 5 times the average, respectively) were also 5 Because of Extraordinary P/L was not considered into the formulation of ROA, and even though ROE includes net profi t, now we must add it (after tax) into the equation of ROE.

Analysis technique: CHAID
By means of the Clementine (SPSS Inc.) software, the CHAID rule induction algorithm (Chi-squared Automatic Interaction Detector) was applied, a highly effi cient statistical technique for segmentation, or tree growing that derives a tree of rules that attempts to describe distinct segments within the data in relation to the output variable (ROE). This allowed us to classify companies according to the different values of the accounting ratios and their profi tability.
In fact, a great many algorithms are capable of generating rules based on decision trees, including CLS (Hunt et al., 1966), ID3 (Quinlan, 1979), CART (Breiman et al., 1984) and C4.5 (Quinlan, 1993). In the present study, we implemented the algorithm known as CHAID (Chi-squared Automatic Interaction Detector), which is simple to apply and widely used. This classifi cation mechanism, originally proposed by Kass (1980), has been used extensively by many authors in different studies to derive a tree of rules, which helped the understanding of many phenomena (Santín, 2006;Galguera, 2006;Grobler, 2002;Strambi, 1998).
As a segmentation tool, CHAID presents important benefi ts. First of all, the technique is not based on any specifi c probabilistic distribution, but solely on chi squared goodness-of-fi t tests, from contingency tables. These tests, given an acceptable sample size, almost always function well. In the second place, it makes it possible to determine a variable to be maximized. This is indeed desirable, and not always achievable with other segmentation techniques. Moreover, classifi cation by segments is always straightforward to interpret, as its results provide intuitive rules that are readily understood by non experts -which, for example, is not the case, with Cluster Analysis. And fi nally, this technique ensures that the segments always have statistical meaning; they are all different, and are the best possible, given the data provided. Accordingly, the classifi cations made using the rules found are mutually exclusive, and so the decision tree identifi es a single response based on a calculation of the probabilities of belonging to a certain class. Last of all, CHAID, unlike other algorithms such us CART (Breiman et al., 1984), is capable of constructing nonbinary algorithms; for example, it can present more than two branches, or data divisions, according to the categories to be explained, for each node.
Using the signifi cance of a statistical test as a criterion, CHAID algorithm evaluates all of the values of every potential explanatory variable. Let us examine in three steps the methodological process to be followed when applying the technique (a complete description is showed by Kass 1980;Biggs, 1991;and Goodman, 1979 To test the possible impact of the number of fi rms on the results, we are working on a new sample extended in time, and our preliminary analysis suggests similar results, with only small variations in the quantitative levels of the rules obtained. 7 It should be noted that SABI offers data on an annual and monthly basis. 8 We are aware that there are several methods for binning into a set of categories, for example, the one proposed by Berka (1998), which will be studied in future research, to compare results with those described in this paper. represents a leaf node if that variable is used to split the node. For each explanatory variable X, the algorithm fi nds the pair of categories of X that is least signifi cantly different (indicated by the largest p-value) with respect to the dependent variable Y. The method used to calculate the p-value is the chi-squared test: split variable defi nes a child node of the split. After the split is applied to the current node, the child nodes are examined to see if they warrant splitting by applying the merge/split process to each in turn. This process continues recursively until the tree is fully grown and no further splits can be made.
The main results of the model are described in the following items:

Support
The support for a scored record is the weighted number of records in the data in the scored record's assigned terminal node (t), i.e., the number of records of each rule. It can be defined N w,j (t) = Σ i ∈ t w i f i j(i) as the weighted number of records in node t with category j, and N w,j (t) = Σ i ∈ t w i f i j(i) as the weighted number of records in category j (any node).

Response (or confi dence):
The confi dence for a scored record is the proportion of weighted records in the data in the scored record's assigned terminal node (t) that belong to a selected category j, modifi ed by the Laplace correction (Margineantu, 2001), with k being the number of categories. It is computed as (N f,t (t) + 1)/ (N f (t) + k). Thus, the level of confi dence (%) of each rule (terminal node) shows the proportion of records of each rule that belong to a selected category j; and, the level of confi dence of a set of rules can also be defi ned as the proportion of records of this rule set belonging to a given category j.

Index:
The index of each of the rules obtained for a given category j is obtained as the ratio between the level of confi dence for each rule (terminal node) and the level of confi dence of the category j in the total sample (i.e., 25%, as the sample is divided into quartiles). Therefore, it is obtained by dividing the proportion of records that present category j in each terminal node (rule) into the proportion of records presenting category j in the total sample (25%). Thus, it represents the increased probability of belonging to the selected category j where n ij = Σ n f n I(x n = i y n = j) is the observed cell frequency and m ij is the expected estimated cell frequency for cell (x n = i, y n = j) under the null hypothesis of Independence. The corresponding p value is given by p = Pr(N 2 > X 2 ), where N 2 follows a chi-squared distribution with degrees of freedom d = (J − 1)(I − 1). The frequency associated with case n is noted by f n .
Then, it merges into a compound category with the pair that gives the largest p-value, and calculates the p-value based on the new set of categories of X. This represents one set of categories for X. The process is repeated until only two categories remain. Then, the sets of categories of X generated during each step of the merge sequence are compared, to fi nd the one for which the p-value in the previous step is the smallest. That set is the one of merged categories for X to be used in determining the split at the current node.
3. Splitting nodes: Each variable is evaluated for its association with the dependent variable, based on the adjusted p-value of the statistical test, and the algorithm selects the best predictor to form the fi rst branch in the decision tree, that is, the explanatory variable with the largest association with the dependent variable (the one for which the chi-squared test has the smallest p-value). If this value is less than or equal to the α split (the split threshold), then that variable is used as the split variable for the current node. Each of the merged categories of the that contains the records presenting the characteristics defi ned for each rule. Therefore, by accumulation, the index of a set of rules can be obtained as the ratio between the proportion of records presenting category j in this rule set and the corresponding proportion to be found within the total sample (25%). Gain: The gain for each terminal node (rule) can be defi ned as the number of records in a selected category j, in absolute terms. For a set of rules or terminal nodes, and in percentage terms, the gain summary provides descriptive statistics for the terminal nodes of a tree, and shows the weighted percentage of records in a selected cate- if record x i is in category j, and 0 otherwise.

Risk:
It represents the risk of error in predicted values for specifi c nodes of the tree and for the tree as a whole.
The risk estimate of a node (i.e. rule) t is computed as is the sum of the frequency weights for records in node t in category j, and N f is the sum of frequency weights for all records in the sample. Anyway, the risk estimate R(T) for the tree (T) is calculated by taking the sum of the risk estimates r(t) for the terminal nodes, computed as R(T) = Σ t∈T ,r(t), where T is the set of terminal nodes in the tree. Table 3 shows means for each one of the variables defi ned for the year 2006, making a comparison between the real estate sector and the all other activities. In the fi rst place, the analysis of the profi tability exhibits similar fi gures for housing companies, with ROE near 12% and ROA between 5%-6%. Focusing on asset structure, real estate companies were characterized by a higher fi xed asset ratio (39.14%) than the mean of all the other activities (33.74%), and, consequently, it shows a lower current asset ratio (60.85% instead of 66.25%), mainly due to the fact that the debtor and cash ratios were much lower (around 21% less). However, it is important to emphasize that the stock ratio was much higher (near 14% more).

Descriptive analysis
Concerning liability structure, the interest rate was similar for the realtor companies and the other ones. Nonetheless, the cost of debt per unit of sales was much higher for the real estate sector (4.62% vs. 1.58%), even taking into account that debt ratio was similar in both groups. The debt structure was different for real estate companies, with a much higher presence of long-term liabilities, resulting in a higher working capital ratio (27.75% instead of 17.05%). As a consequence of that structure of assets and liabilities (with similar leverage ratio but higher fi xed assets and long-term liabilities), solvency and liquidity ratios were much better for the housing sector, e.g. the asset coverage ratio or the liquidity ratio were almost two times the ones of the whole sectors.
On the other hand, return on sales was better in this sector (13.11% vs. 4.75%). This was mainly because of the higher productivity of labor (exhibited also by a lower cost of employee ratio) and the higher growth of sales. However, the assets turnover was lower due to the fact that the stock ratio was much higher and, then, the current asset turnover was much lower. This explains that the fi gures for ROA were similar to those for all the activities, as mentioned above. The growth rate of fi xed assets and total assets were also higher for housing companies, which demonstrates the superior dynamism of the housing sector in comparison to the whole economy.

Exploratory analysis
The exploratory analysis of correlations between each of the explanatory variables and ROE is shown on In the fi rst place, the analysis of asset structure shows a negative relationship between fi xed asset ratio and profi tability, because of the lower fi xed asset ratio the higher ROE and ROA. On the one hand, the most profi table companies reduce their fi xed asset ratios by means of lower tangible asset ratios, which imply that they achieve higher turnover ratio and better levels of effi ciency in the production. On the other hand, fi rms increase their current asset ratios not only with higher debtor and cash ratios, but with higher stock ratios, which is very important because it could suggest that housing companies own an important problem of oversupply. Summing up, this asset structure allowed them to reach higher asset turnover ratios, explaining the higher ROA, as the DuPont Model points out.
Secondly, there is a positive relationship between leverage and profi tability. Moreover, it is shown that the lower cost of debt, the higher profi tability. Current liability ratio represents a high percentage among the most profi table companies, which means that they are usually fi nanced by trade providers that offer a fi nancial product cheaper than loans with banks or long-term liabilities, explaining the lower cost of debts. The most profi table companies own bigger sales and allow them to get the best rates when negotiating the conditions with the providers. With respect to long-term liability ratio, the most profi table companies also presented higher fi gures, and therefore, both ratios explain the high leverage of these companies.
Last, but not least, the rates of activity show that the return on sales was higher for the most profi table companies, as it is expected at the DuPont Model, which, together with higher asset turnover, explains the better fi gures for ROA. The main variable that determines the high return on sales among the most profi table companies was the cost of employees; to be sure, the lower cost of employee ratio, the higher return on sales. In fact, labor productivity was also higher for the most profi table companies. Additionally, the decline on depreciation ratio for these companies (caused by the low percentage of fi xed assets) contributed to improve return on sales and, therefore, ROA; besides, the growth rate of sales was also higher for those companies.
Consequently, as Financial Leverage Equation predicts, when ROA exceeds interest rate, leverage contributes to increase ROE.
As a conclusion, the most profi table companies are characterized by: • Higher ROA explained by: • Lower fi xed asset ratios, increasing fi xed asset turnovers.
• Higher return on sales due to savings in labor costs and higher productivity, lower rates of depreciation because of getting lower fi xed asset ratios, and lower cost of debts. Additionally, the growth of sales explains the higher ROA.
• Higher leverage ratio, mainly because current liability ratio is higher in these companies as they are usually fi nanced by trade providers with long periods of payment.
To sum up, as it is confi rmed by the F-test statistics, the main explanatory variables of the profi tability of the real estate companies were the fi xed asset turnover, the return on sales and the level of fi xed assets and debt (included into the asset coverage ratio), all of them provided by an expansive economic cycle characterized by a high level of sales and a high leverage of the companies. It verifi es what it is expected in the DuPont Model. Undoubtedly, the analysis of size indicates that the level of assets was not specifi cally an important explanatory variable of the profi tability, as there were not signifi cant differences in the fi gures of total assets between companies with low and high ROE, confi rming some previous studies (Rodríguez, 1989;Galán, 1997). However, the fi gure of sales, as noted above, was the variable which allowed companies to reach those higher ROA and ROE. It is also important to point out that the main weakness of the real estate companies lies in December 2010 J. econ. fi nance adm. sci., 15(29), 2010 their high stocks and, therefore, lower fi xed asset which, together with the high debt, brings the real estate sector into a vulnerable status to face the crisis because of the worse solvency and liquidity ratios.
Explanatory analysis: predictive analysis with CHAID 1. Results and rules obtained with the model. Success and failure profi les.
In the previous section, DuPont Model has been contrasted empirically by analyzing the relationships between the profi tability and the accounting ratios, and extracting the most relevant explanatory variables of the profi tability. This section aims at quantifying those relationships and their explanatory variables with the purpose of obtaining the profi les, that is, the combinations of accounting ratios, of the most profi table companies.
With CHAID modelling, the sample is segmented taking into account the different levels of the explanatory variables, building a classifi cation tree fi nishing in a set of terminal nodes -with routes from the origin node (the whole sample) to each terminal node (t)-, which constitute the profi les or rules for each of the categories defi ned in the variable to be explained (ROE). Thus, there are so many rules as terminal nodes. However, the tree segmentation results in a large number of rules or company profi les, so that for the purposes of the present study only the most important have been selected 10 . Therefore, the rules obtained for ROE=high medium and ROE=low medium has been omitted, since the most interesting and useful to study are the extreme quartiles, indicatives of success and failure profi les. Moreover, only the most important rules for the quartiles studied have been selected, which are those presenting the highest classifi catory and predictive capacities in terms of the level of confi dence.
Accordingly, we fi lter out the rules obtained for the categories ROE=high and ROE=low, and after ordering them by level of confi dence, the most important rules in each category are selected. The fi nal result, thus, is that we have the rules for the highest sampling decile in each category, representing around 600 companies for each category studied.
The rules selected, in both cases, are illustrated on Table 4, which shows in brackets the corresponding support (number of fi rms of the sample with the profi le detailed in the rule) and confi dence (percentage of these companies which belong to the category studied). Special mention deserves the rules (profi les) for the most profi table companies, those ones with ROE=high. These rules indicate the fi gures within which these variables should be situated in order to ensure good levels of ROE, with a high level of probability. We can stand out several profi les of companies (rules) which are likely to obtain ROE=high, because they own higher confidence percentage than the whole sample (25%).
As an example, with a support of 109 fi rms, rule 14 shows that 84.4% of the companies with return on sales over 7.88% and asset turnover over 144.22% obtained ROE=high. Also, rule 12 exhibits that when the return on sales is higher than 14.8%, the asset turnover ratio does not need to be so high as noted above, but even exceeding 90.94%, 86.7% of the 105 fi rms achieve high levels of ROE (upper quartile, ROE=high), exceeding 25% in three times, i.e. the percentage of fi rms with ROE=high in the whole sample before segmentation. On the other side, rule 1 exhibits that 69.7% of the companies with asset turnover between 16,11% and 23.42%, return on sales over 14.8% but debt ratio over 90.65%, also obtained ROE=high. Therefore, it is possible to obtain higher ROE not only with better asset turnover, but as well with lower asset turnover Vol. 15, Nº 29 J. econ. fi nance adm. sci., 15(29), 2010 ratios, providing that the debt ratio and return on sales are higher. There are multiple combinations which are represented by the other fi ve rules.
It would be possible to analyze all the selected rules in the table in the same way, and thus to obtain a series of profi les and/or recommendations providing real estate companies with quantitative control measures for obtaining high levels of ROE. In summary, it is observed that there are two groups of profi les of high profi tability: the fi rst one, refers to companies with a moderated leverage but with good ratios of turnover and return on sales (rules 9, 12 and 14); the second one, those with worse asset turnover ratios and return on sales, though assuming higher fi nancial risks while supporting low asset coverage ratios and high leverage (rules 1, 2, 3 and 8). In the current marketplace, which is characterized by the fi nancial crisis, the companies in the fi rst group will be able to deal with the crisis with more guarantees, although they will not be free of diffi culties, especially because, as exploratory analysis showed, the high asset turnover ratios mainly come from high fi xed asset turnover, and it would be advisable that they were caused by higher stock turnover instead.
At the other extreme, we have the profi les of the companies with the lowest levels of ROE. In the same way, six rules for ROE=low can be extracted to identify clearly the profi les of the least profi table companies. For example, rule 2, for a sampling support of 111 companies, indicates that there is a 93.7% probability that the fi rms with an asset turnover ratio of less than 6.61%, an asset coverage ratio between 110.03% and 202.79%, and a solvency ratio under 1.32% will present low levels of ROE (lower quartile, ROE=low).
As a conclusion, the study of these rules shows that asset turnover, return on sales, and asset coverage ratio (also measured indirectly with the debt ratio or the solvency ratio) are the variables which determine the profi tability, as exploratory analysis and DuPont Model showed, but additionally, this explanatory analysis allows quantifying the levels of these variables to achieve the highest fi gures of profi tability. Thus, it provides the main accounting ratios, to ascertain the most suitable values for them, which managers of companies should monitor in order to ensure good levels of ROE.

Goodness of the model
To illustrate the goodness of the model, the following matrix of incorrect classifi cation shows the companies correctly and incorrectly classifi ed with all the rules obtained by CHAID segmentation (see Table 5). The total risk R(T), that is, the sum of all the risks for the set of terminal nodes (rules), is 40.54%, and it measures the percentage of cases classifi ed incorrectly when all the rules generated by the model are used for classifi cation or prediction. This also enables us to determine the overall level of confi dence provided by the entire tree of rules (59.46%). The error rate is much lower than the initial 75% found with the unsegmented sample (the 75% represents the proportion of cases that do not belong to a specifi c selected category). Therefore, the model of rules provides an improvement of the explanatory and predictive capacity by reducing this risk from 75% to 40.54%.
However, if we make our prediction using these rules exclusively, the error rate is reduced considerably as our interest mainly lies in the rules for ROE=high and ROE=low, specially for the ones described above and selected within each of these categories,. Thus, Table 6 shows that for the seven rules selected for ROE=high, with a sampling support of 548 fi rms (a decile of the entire sample), the probability of an accurate prediction increases up to 79.29% (confi dence or response). This is equivalent to an index of 317.14%, i.e. more than three times higher than with the 25% of the total sample (the percentage of companies with ROE=high in the unsegmented sample). In other words, 548 companies showed the above-stated levels of variables for those seven rules described, and 79.29% of them achieved high ROE. This figure measures the level of confi dence in the set of seven rules defi ned for ROE=high, while the individual level of confi dence for each of these rules was shown on Table 4. This set of rules has a sampling support of 548 fi rms, which represents a decile of the entire sample, and it explains 31.69% (gain) of the companies with ROE=high.
Furthermore, Table 6 also illustrates the level of confi dence for the set of six rules obtained for ROE=low, with the corresponding gain and index indicators, for each of which similar goodness analyses could be made. In fact, results now present better fi gures, with a percentage of confi dence over 90% and an index over 360%, which indicates that predictions carried out with these rules provide around 3.6 times more accurate than those presented at the unsegmented sample into quartiles.
Finally, the following charts in Figure 1 illustrate the gains, responses and indices for the set of rules obtained with the CHAID modelling for Note that they show the behaviour of those measurements with respect to the different percentiles. In particular, for the tenth percentile of rules, the values are the same than those previously provided by Table 6, corresponding to the set of rules selected in this table and studied in this paper.
In particular, the Response Chart indicates the level of confi dence in the rules; thus, for example, the top decile of rules selected for ROE=high has 79.29% of confi dence. In the same way, for the top decile of rules selected for ROE=low, this percentage reaches up to 90.26%, that is, almost 100% of confi dence in the rules shown. Thus, it means that the higher the level of the chart over the 25% benchmark (the confi dence in the prediction for the categories in the unsegmented sample), the higher the predictive capacity of the model.
On the other hand, the Index Chart also evaluates the effectiveness of this set of rules, because it measures the extent to which the companies with the profi le defi ned by the rule (or a set of rules) are likely to achieve ROE=high when compared with any company of the unsegmented sample. As an example, the index value of 317.14% indicates that companies with profi les defi ned by the seven rules selected for ROE=high are 3.17 times more likely to achieve ROE=high than any other company in the whole sample. For rules selected for ROE=low, that percentage also achieves an important improvement up to 361.05%.
The Gain Chart is interpreted in a similar way, with the model presenting better goodness as the curve is higher. For example, the top decile of rules for ROE=low has 90.26% of confi dence, which represents a probability of accurate prediction 3.61 times higher than the initial 25% (corresponding to the unsegmented sample).
Therefore, in all these charts the elevation of the curve above the initial slope refl ects the substantial improvement in predictive and explanatory capacity achieved from applying the rules obtained with CHAID modelling and, in particular, with the rules selected at the fi rst decile (top decile of rules) for each category studied (ROE=high and ROE=low). All the charts allow us to conclude that the set of rules selected implies an important contribution to prediction for the fi nancial profi tability of the real estate companies.

DISCUSSIONS
To sum up, the main sources of high profi tability within the real estate companies can be summarized into three: (a) the high fi xed assets turnover (resulting from the low fi xed assets ratios); (b) the high return on sales (made possible owing to high sales and the reduction of the costs of employees); and, (c) the low asset coverage ratio (low fi xed assets and high leverage). All of them were provided by an expansive economic cycle that is characterized by a high level of sales and a high leverage of the companies bound up with low interest rates. The level of assets was not specifi cally an important explanatory variable of the profi tability, confi rmed by some previous studies about profi tability among fi rms with different sizes (Rodríguez, 1989;Galán, 1997). Nevertheless, sales as a measurement of size appear to be the main implicit explanatory variable of ROE, because it allowed companies to reach higher return on sales and turnover (Galvé & Salas, 1993;González, 1997).
However, in the current marketplace, it will be diffi cult to maintain those high fi gures for variables such as return on sales, leverage or turnover. In fact, there are several phenomena endangering the perspectives close to the real estate companies, among which are included the fall of their economic activity and sales, the lack of fi nancing and the mortgage loan restrictions due to the hardening of loan concession criteria. Also, the exorbitant growth of unemployment in families and the bankruptcy of companies loom in the future, resulting in an important increase of the default rate.
The analysis holds up that the main weakness of the housing companies lies in their high stocks and lower fi xed assets, which, compounded with high debt, result in risky solvency and liquidity ratios. All of them bring the real estate sector into a vulnerable status to face the economic and fi nancial crisis, with an important collapse of the levels of profi tability. Firstly because the high asset turnover ratios are not caused by high current asset turnover, as would be desirable to limit the impact of the crisis on ROA, but by high fi xed asset turnover, enabled by the high sales and the possibility to maintain reduced solvency ratios (low fi xed assets and high debt). In the current environment, those high sales and low solvency ratios will not be sustainable: on the one hand, it will result in a fall of ROA due to the reduction of turnovers caused by the drop in sales and the need to increase the ratios of fi xed assets, and on the other hand, the fall of ROE due to the reduction of high debt ratios to improve solvency. Secondly, because of low labor costs, depreciation and fi nance costs about sales have led to high sales, but it was only possible due to the high volume and strong growth of sales. As a result, these companies are very sensitive to drops in sales at periods as the current crisis, which also explains the current decline of ROA. This paper contributes with an exploratory analysis of the real estate sector and provides a set of rules obtained by the CHAID algorithm, which could help companies to know the level to be achieved by the different accounting ratios if they want to obtain high fi gures of ROE. So, it offers profi les of profi tability and recommendations to real estate companies regarding the main variables and accounting ratios that may infl uence ROE, as well as to ascertain the most suitable values for them. Taking into account these values and in order to obtain high levels of ROE, these fi rms should aim to achieve those profi les described, in particular, by reducing the stock ratios to increase the current asset turnovers and to compensate the decline in asset turnover caused by the collapse of sales.
In a scenario of general economic contraction, the housing sector must continue its particular and rigorous adjustment of stocks. The statistics confi rm the decrease in production, transactions, mortgage loans and, for the fi rst time, property prices, originated by the extensive real estate stocks available for sale that are pushing down prices. This will bring in a traumatic adjustment of supply, but it will be also necessary if real estate companies want to reach sustainable ratios of asset turnover. This scenario implies to reduce stocks in order to increase the asset turnover ratios, far away from excessive historical stocks, as shown in this paper.
Finally, from a methodological point of view, it would be appropriate to apply other algorithms to compare the stability and prediction power of the model created, i.e. the advanced version C5.0 (Chesney, 2009). This is true particularly because we are aware that the discretization of the continuous explanatory variables could represent a strongly impressive preprocessing statement. In this study we have focused on the implementation of the CHAID method to obtain preliminary results as a starting point for future research methodologies on which we are currently working, not only the algorithm C5.0, but also the Neural Networks and the Support Vector Machines (SVM).