Table of Contents

  • Multivariate data analysis (MDA) has an established and acknowledged position in diverse fundamental and applied sciences, especially those of engineering but likewise various biology and biomedical disciplines. Since decades back MDA is an integral part of nutritional epidemiology and has, more recently, even found inroads to dietary surveys. Nonetheless, the latter field has hitherto witnessed but scattered MDA application and very little, if any, of advanced and exploratory nature. This situation contrasts sharply to the quite extensive accumulated resources allocated to such surveys globally, including those accomplished in the Nordic countries. Actually, dietary surveys – commonly conducted periodically within the respective national Nordic food regulatory agencies – are typically designed to capture eating habits at comparatively high levels of differentiation, thereby comprising rich data sets. Without taking advantage of appropriate computational technology, however, much information embedded in these compilations will remain curtailed. Consequently, a more regular implementation of relevant MDA within the dietary survey area will undoubtedly render intricate eating patterns, within and across countries, accessible to inspection and construal.

  • The study outlined in this report strived at disclosing pertinent patterns in dietary surveys by means of an array of multivariate data analysis (MDA) techniques. The overall purpose was thus to unveil embedded patterns in selected data material, but also to generally demonstrate feasibility of new computational technology in this area. The material selected for this purpose encompasses food consumption survey data from Sweden and Denmark. The first among those compilations is known as Riksmaten – barn 2003, harbouring children of three age groups (four, eight and eleven years of age), whereas the latter data set is an excerpt – holding preschool children (four to five years of age) – of the Danish National Survey of Diet and Physical Activity, compiled over several years until 2008. These sets of food consumption data have previously been subjected to classical statistical analysis, but were – prior to embarking on this exercise – devoid of scrutiny by means of more advanced computational techniques. The analytical exercises described in this report encompass two major fields of MDA, which can be summarised as Unsupervised Learning/Descriptive modelling, on the one hand, and Supervised Learning/Predictive Modelling, on the other.

  • In Western societies, there is – since decades back – a trend towards higher caloric intake and overweight, but this predicament is increasingly common also in developing countries.1 Especially in tandem with a sedentary lifestyle, overweight and obesity predisposes to an array of cardiovascular disorders, type II diabetes and other ailments. Notably, the metabolic syndrome – a constellation of cardiovascular disease risk factors, e.g insulin resistance, abdominal obesity, hypertension and atherogenic dyslipidemia – shows high dietary association. Although many factors, including those beyond dietary habits, seemingly contribute to this unhealthy condition, it is worth mentioning that the global increase of sugar-sweetened beverages, as seen over the past several decades, has been firmly associated with mounted risk of developing both the metabolic syndrome and overt type 2 diabetes mellitus. Conversely, high intake of whole grain is connected with reduced risk of developing glucose tolerance typical of pre-diabetic conditions. Moreover, dietary habits can also contribute to the risk of contracting colorectal malignancy. Actually, there is concern among health professionals of a likely connection between high consumption of red and processed meat and colorectal carcinoma incidence, although no mechanistic model as yet has been demonstrated. Nutritional deficiencies – typified by those tied to an array of micronutrients such as (pro-) vitamins, essential fatty acids, iodine and selenium – are, naturally, equally important to confront. However, certain diets, e.g. those rich in vegetables, fruits, beans, whole-grain cereals, olive oil and certain fish species, especially in relation to typical consumption in many industrialised areas, have proven able to promote cardiovascular and overall health in the population. Notably, adherence to the Mediterranean diet, or an appropriate geographical derivative, seems conducive to good health status. Thus, it is imperative to create a supporting atmosphere for healthy eating, as vividly outlined in a review on prospects for governmental bodies to promoting healthy food and eating environments. Collective determinants of eating behaviour include a broad range of contextual factors which are, however, only partially understood.

  • The overall objective of the study reported here was to apply modern exploratory and prediction based MDA techniques to data on food consumption habits in order to glean deepened insight in various intra– and inter-population relationships within data sets formerly processed by classical statistical methods only. Thus, we wanted to unveil and decipher embedded multidimensional properties among and within young consumers that cannot be found using conventional univariate analyses.

  • Consumption data were from two distinct national surveys, as specified below:

  • Compliant with most studies of selected parts of the data material, e.g. defined by age or nation, an initial inspection of the unabridged Swedish dietary survey data encompassed top-down Cluster Analysis by means of the OMB-DHC algorithm, with each food group intake appearing as a weight fraction (percent) of the total intake. This provided a multibranching display, which did not halt at any pre-defined hierarchical level, but proceeded until singlet entities were attained. The upper part of the accordingly derived dendrogram chart – built on the entire Swedish data set – discloses five main aggregations, also referred to as dietary prototypes in this report, with notable dissimilarity in size and complexity across them (Figure 1). As viewed from the highest hierarchical stage, cluster RA I (Riksmaten – barn 2003, All subjects, subpopulation I), also labelled the Cereals cluster, is markedly small and relatively homogeneous. Actually, it is also most remotely situated, in relation to the remaining data (note edge lengths at the top level of Figure 1). The next assembly in line – RA II (the Milk cluster) – is largest of all and also features the highest heterogeneity, as revealed by many downstream segregation points. Aggregation RA III is also large, but slightly less scattered; the remaining two clusters – RA IV and RA V – show more intermediate properties with respect to size and diversion (Figure 1). The entire set of alternate cluster designations, as dictated by salient food group(s) (or other relevant pattern), appear as follows (RA I through RA V): i) Cereals, ii) Milk (low fat), iii) Traditional, iv) Soft beverages (sweetened)/Buns & cakes and v) Varied/Water (Figure 1 and Table 1). For clarity, outstanding food groups of each such top-level aggregation were depicted as bar charts, which can be perceived as concise graphical dietary prototypes (Figure 2). Notably, Cereals and Milk are pertinent features of clusters RA I and RA II, respectively. Cluster RA III shows a rather even distribution across many food groups, but is particularly rich in Fruits, Juice, Milk (low-fat), Soft drinks (light), Rice, Meat & poultry and Desserts. We have thus chosen to tentatively label this prototype Traditional. Conversely, Figure 2 highlights Soft drinks (sweetened) as a major property of prototype RA IV, but Buns & cakes as well as Snacks are additional salient features of this aggregation (Table 1). Clearly, Milk (RA II) and Traditional (RA III) dietary prototypes stand out as the two largest populations within the ensemble. Seemingly, the Cereals (RA I) prototype is mostly built of preschool children, whereas those of Soft beverages (sweetened)/Buns & cakes (RA IV) and Varied/Water (RA V) are largely restricted to elementary school (8– or 11 years of age) consumer categories (Figure 3A and B).

  • The study presented here shows selected findings and interpretations based on MDA of two dietary surveys – one conducted in Sweden and one in Denmark – focused on preschool and elementary school children. In this undertaking both non-supervised and supervised MDA techniques have found application. Thus, the focus has been strictly laid on dietary patterns, which are inherently complex and seemingly need more than a single MDA technique to become satisfactory deciphered. For example, most dietary patterns typically feature low consumption of some food groups, jointly with high consumption of other foods. Cluster Analysis, typically based on the K-means algorithm, and Factor Analysis, commonly in the form of PCA, have found increasing application in the area over the last decade.29 In a dietary survey context, Cluster Analysis gathers consumers into non-overlapping groups based on dietary similarity, whereas PCA identifies linear combinations of foods that are frequently consumed in combination. Thus, these two statistical techniques describe diets from different perspectives. A major technique applied to the study outlined here is an in-house built development of the K-means algorithm, featuring divisive-type multi-furcating clustering operation as well as output display of several hierarchical levels. This hierarchical design proved indeed very helpful in identifying and selecting relevant populations to dietary prototypes. From this inroad, two separate extensions – CMDS and HPBCA – were designed and likewise allowed to operate on the Danish and Swedish data sets. The PCA statistical technique also found application here, but mostly to support results derived by other methods. Moreover, predictive modelling – based on the widely acknowledged RF algorithm as well as the computational-efficient NSC – was applied to selected excerpts of the entire data set, i.e. Danish and Swedish preschool children.

  • This work was funded by the Nordic Council of Ministers, Nordic Working Group for Diet, Food & Toxicology (NKMT).