Identification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree

nazari, jalil; Mohamady, Akhtar; Azghani, , Mahmood Reza; Mazloumi, Adel

Volume 16, Issue 1 (2019) ioh 2019, 16(1): 72-89 | Back to browse issues page

Mendeley

Zotero

RefWorks

nazari J, Mohamady A, Azghani , M R, Mazloumi A. Identification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree. ioh 2019; 16 (1) :72-89
URL: http://ioh.iums.ac.ir/article-1-2080-en.html

Identification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree

Jalil Nazari ^*

, Akhtar Mohamady

, , Mahmood Reza Azghani

, Adel Mazloumi

Tabriz University of Medical Sciences , nazari_j@yahoo.com

Abstract: (4985 Views)

Background and aims: Anthropometry is the branch of human science that considers the physical measurement of the human body, especially size and shape. One application of anthropometrical data in ergonomics is the design of working space and the development of industrialized products. So that the tools, equipment and workstations, which designed based on the physical dimensions of the workers, can increase their productivity.
According to statistics, 32% of workers possess awkward postures, and 15% of them use improper tools while working. These lead to increases the possibility of work-related injuries, particularly musculoskeletal disorders. Therefore, most countries have made great efforts to build own anthropometrics databases for various groups of citizens.
Due to the lack of comprehensive databases in Iran, often it has been referred to information from the Western countries, especially the United States. The western nations have a big difference in anthropometric dimensions with the Iranian population, and their manufacturers also have been designing and developing tools and machines based on own mental criteria. The mismatching of these designed tools and workstations with the dimensions of the Iranian user's body can cause complications such as fatigue or other physical damage. So, Iranian researchers have suggested that at least one comprehensive, up-to-date and general database needed from the population of the country. Consequently, the building of anthropometric databases is necessary.
The results of several studies have shown that there is a correlation between the dimensions of the body with each other. These correlations can be used to create regression equations for estimating anthropometric dimensions. Identifying, categorizing, and determining the type of relationship between anthropometric dimensions play a significant role in treatment, fitness talent, and clothing production, etc. But, the relationship between these dimensions is affected by the environmental, economic and social factors change. The important affecting physical factors include age, sex, race, body structure, occupation, diet, and physical activity. Among these factors, the race has a very critical role in the variation of body size. So, the differences among diverse races are more than the variations between different nations. Ethnic diversity is a crucial factor that can affect anthropometric data and its application areas. For example, this variety in body dimensions between people with different sex and races can produce many problems in product design. Therefore, due to the necessity of having a comprehensive database (anthropometric data from different races) in the country, extracting appropriate information from massive data and transforming them into knowledge on one hand, and time-consuming and high expense process for collecting data on the other hand, especially in the Iranian population with many races, the use of modern methods is essential. Data mining is a new method used to extract useful and unknown information from raw data. Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. Decision Tree is one of the easiest and popular classification algorithms to understand and interpret. It can be utilized for both classification and regression kind of problem. It can also remove part of the consumed time and expense of collecting anthropometric data. Therefore, this study was intended to investigate and identify the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree.

Methods: The present research is a methodological study using the classification systems in the field of data mining. The sample of the study was anthropometrics raw data (37 dimensions) of 3720 subjects (3,000 men and 720 women) from six races (Fars, Turks, Kurds, Lars, Baluch, and Arabs) of Iranian workers.
The decision tree (DT) method was used to identify the most important factors of racial differences in anthropometric dimensions of Iranian workers. The WEKA software (version 3.6.12) was used to analyze data and implement data mining algorithms. In the case of the WEKA system, the data is extracted and translated into a standard format we call ARFF, (Attribute-Relation File Format).
In the current study, the preprocessing operation on raw data performed by using a classification. Data preprocessing is one of the most critical steps in a data mining process which deals with the preparation and transformation of the initial dataset. Data preprocessing methods are divided into, data cleaning, data integration, data transformation, data reduction categories. Data cleaning routines work to clean the data by filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies. Dirty data can cause confusion for the mining procedure, thus removing these data is a more appropriate method.
For data preparation, first of all, missing data values identified by using statistical methods and then to replacing the lost data, the PLS Filter and Missing Class Values algorithms were used in the WEKA software. These algorithms first classify the data and compute the average value for each class and fill the missing data of each class with the mean value of that class. Distorted data also eliminated by using the Remove Useless algorithm. Later data preparation, Principal Components Analysis (PCA), one of the most used methods, implemented. It is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. The primary motivation behind PCA is to reduce, or summarize, a large number of variables into a smaller number of derived variables that may be readily visualized in 2- or 3-dimensional space. The new set of variables created by PCA can be used in other analyses, but most commonly as a new set of axes on which to plot your multivariate data.
After data preparation, the data are standardized and then the KMO factor is calculated. The KMO factor measure of sampling adequacy and compares the observed correlations and partial correlations among the original variables. The KMO increases with an increase of the number of variables and the correlation coefficients between them but does not much depend on the sample size. Specifically, correlation matrices with KMO< 0.5 are entirely inappropriate whereas those with KMO below 0.6–0.7 must be treated with caution.
Then after the decision tree technique is applied. This technique is a tree-like structure that describes the set of rules that led to the decision, and the ease of interpretation is one of its most important features. This technique is used for categorization and is a graphical method for comparing alternative competition and assigning value to them by combining uncertainties, costs, and repayments by specific number values, which usually consists of several nodes called nodes Inputs and outputs are known. The rules created in the decision tree are also "if" and "then". In each node, you can also split more than two. The CHAID algorithm used to splits the target into two or more categories that are called the initial, or parent nodes.

Results: for the analysis of the main factors, the 37 variables (anthropometric dimensions) converted to standard values. KMO value for the total data was 0.947. Since the obtained KMO (0.95) value was greater than the considered criterion (0.7). The existence of the necessary correlation between the input variables confirms the main factors analysis. The result of the analysis indicated, of the 37 anthropometric input variables, only 21 variables had a coefficient of above 0.7 in the seven extracted factors. The remaining 16 variables had little importance due to their high correlation with the main variables and excluded from the analysis cycle. In the decision tree, the race variable selected as the target variable or dependent variable, and its relation with 21 anthropometric variables examined. The optimal number of clusters obtained was 7 clusters. After entering and determining the type of data and dividing them into two sets: one with 70 percent of the source data, for training the model, and one with 30 percent of the source data, for testing the model. This default was chosen because a 70-30 ratio is often used in data mining. The results of the study disclosed that the distinctive factors in categorizing and creating ethnic differences among men were the Shoulder height in sitting position, the eye height in sitting position, the elbow- fingers length, bi-acromial breadth, the width of foot and length of the head, and among the women, Face Breadth, bi-acromial breadth and the length of the elbow-tip of the fingers.

Conclusion: Identification of distinctive factors in the classification of racial differences is one of the main findings of the present research in various ethnic groups. According to the results of this study, these factors in Iranian women and men were different and regardless of race, the following conclusions are made; the distinguishing factors in men are related to anatomical factors (temple) and in women related to facial aesthetics (beauty).
The results of this study also confirm the utility of using the decision tree method to investigate the interactions between predictor variables that can be identified the most important determinants of race by combining different nodes in the overall structure. The results from this study verified the using decision tree as an effective method of identifying important variables for classification and detection of racial differences in the anthropometric field.
Furthermore, this finding could be used to designing optimal ergonomic workstations for Iranian workers with different Races, moreover to applications in the manufactures and design process, can be used in other cases such as forensic medicine for diagnosis and making orthopedic products.

Keywords: Anthropometry, Data mining, Decision tree, ethnic

Full-Text [PDF 929 kb] (2363 Downloads)

Type of Study: Research | Subject: Ergonomics
Received: 2017/12/10 | Accepted: 2019/01/14 | Published: 2019/06/1