Medical Care |

Medical Care



Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions Pejman Khadivi∗† Jiangzhuo Chen‡ Patrick Butler∗† Elaine O. Nsoesie‡§¶ Sumiko R. Mekaru§ John S. Brownstein§¶ Madhav V. Marathe∗‡ Naren Ramakrishnan∗† ever, traditional surveillance reports are published with Modern epidemiological forecasts of common illnesses, a considerable delay and thus recent research has fo- such as the flu, rely on both traditional surveillance cused on mining social signals from search engine query sources as well as digital surveillance data. However, volume and social media chatter most published studies have been retrospective. Con- One of the pioneering work in this field, was due to currently, the reports about flu activity generally lags Ginsberg et al. where ILI case counts are predicted by several weeks and even when published are revised from the volume of search engine queries. This work for several weeks more. We posit that effectively han- inspired significant follow-on work, such as where dling this uncertainty is one of the key challenges for used search query data from Baidu (a a real-time prediction system in this sphere.
popular search engine in China) to detect influenza paper, we present a detailed prospective analysis on the More real-time ILI detection systems generation of robust quantitative predictions about tem- have been proposed by modeling Twitter streams.
poral trends of flu activity, using several surrogate data Apart from such social media sources, there has also sources for 15 Latin American countries. We present our been considerable research on exploiting physical indi- findings about the limitations and possible advantages cators such as climate data. These primary advantage of correcting the uncertainty associated with official flu of such data sources is that the effects are much more We also compare the prediction accuracy causal and less noisy. Shaman et. al. explored between model-level fusion of different surrogate data this area in detail and found absolute humidity to be a sources against data-level fusion. Finally, we present good indicator of influenza outbreaks.
a novel matrix factorization approach using neighbor- While the aforementioned works have made impor- hood embedding to predict flu case counts. Comparing tant strides, there are important areas that have been our proposed ensemble method against several baseline relatively less studied. First, only a few works have fo- methods helps us demarcate the importance of different cused on combining multiple data sources to aid data sources for the countries under consideration.
in forecasting. In particular, to the best of our knowl-edge there has been no work that investigates the com- bination of social indicators and physical indicators toforecast ILI incidence. Second, and more importantly, Surveillance reports published by health organizations official estimates as reported by health organizations are one of the primary resources for monitoring in- (e.g., WHO, PAHO) are often lagged by several weeks fluenza like illness (ILI) cases. For years, these reports and even when reported are typically revised for several have been the primary source of information used by weeks before the case counts are finalized. Real-time healthcare officials for policy making decisions. How- prediction systems must be designed to handle the fore-casting of such a ‘moving target'. Finally, most existing ∗Dept. of Computer Science, Virginia Tech, Blacksburg, VA, works have been retrospective and not set in the context of a formal data mining validation framework. To over- †Discovery Analytics Center Vir- come these deficiencies, we propose a novel approach to ginia Tech, Blacksburg, VA, USA ‡Network Dynamics and Simulation Science Laboratory, Vir- ILI case count forecasting. Our contributions are: ginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, • Our approach integrates both social indicators and physical indicators and thus leverages the selec- §Childrens Hospital Informatics Program, Boston Childrens tive superiorities of both types of feature sets.
Hospital, Boston, MA, USA ¶Department of Pediatrics, Harvard Medical School, Boston, We systematize such integration using a novel ma- trix factorization-based regression approach using these works, we curated a custom ILI related keyword dictionary which is described in details in Section Physical indicators for detecting ILI inci- dence levels: Tamerius et al. investigated the exis- tence of seasonal cycles of influenza epidemics in differ- ent climate regions. For the said work, they considered climatic information from 78 globally distributed sites.
Using logistic regression they found that, strong corre-lations exist between influenza epidemics and weather conditions, especially when conditions are cold-dry or humid-rainy. Similarly, exciting results were reported by Shaman et. al. in where they discovered ab- solute humidity to be a key indicator of flu. To uncoverthese relationships they used non-linear regressors such as Kalman filters, and this was a key inspiration for usin finding a uniform model for the varied data sourcesas explained in Section Our ILI data pipeline, depicting six different data sources Event dynamics modeling: Denecke et al. used in this paper to forecast ILI case counts.
proposed an event-based approach for early predictionof ILI threats Their method (M-Eco) considers neighborhood embedding, thus helping account for multiple resources such as Twitter, TV reports, online non-linear relationships between the surrogates and news articles, and blogs and uses clustering to identify the official ILI estimates.
signals for event detection. Network dynamic solutions • We investigate the efficacy of combining diverse have also been used to study the behavior of an sources using data fusion and model fusion meth- epidemic in a society.
ods. We also discuss their relative strengths.
Problem Formulation • We propose different ways of handling uncertainties In this section, we formally introduce the problem. LetP in the official estimates and factor these uncertain- = hP1, P2, . . , PT i denote the known total weekly ILI ties into our prediction models.
case count for the country under consideration, wherePt denotes the case count for time point t and T denotes • Finally, we present a detailed and prospective the time point till which the ILI case count is known.
analysis of our proposed methods by comparing Corresponding to the ILI case count data, let us denote predictions from a near-horizon real time prediction the available surrogate information for the same country system to official estimates of ILI case counts in 15 by X = hX1, X2, . . , XT 1i, where T 1 is the time point countries of Latin America.
till which the surrogate information is available and Xtdenotes the surrogate attributes for time point t (> T ).
The problem we desire to solve is to find a predictive Related work naturally falls into the categories of social model (f ) for the case count data, as presented formally media analytics, physical indicators, and event dynam- ics modeling. These are next described as follows: Social media analytics: Most relevant works us- In this paper, in order to better understand the ing social media analytics focuses on Twitter, specifi- importance of different sources, we assume that the ILI cally, by tracking a dictionary of ILI-related keywords activities in different countries are independent of each in the data stream. Such investigations have often fo- cused on the importance of diversity in keyword lists, Methods Focusing on the methods, we employ e.g., In Kanhabua and Nejdl used clustering non-linear temporal regressions over the surrogate at- methods to determine important topics in Twitter data, tributes to forecast the case count using three mod- constructed time series for matched keywords, and used els: (a) Matrix Factorization Based Regression (MF), Jaccards coefficient to characterize the temporal diver- (b) Nearest Neighbor Based Regression (NN), and (c) sity of tweets. They noted, that such temporal diversity Matrix Factorization Regression using Nearest Neighbor may be correlated with real-world ILI outbreaks. In embedding (MFN). For each of the methods, we define the authors studied the dynamics between the change two parameters: β and α. α is the lookahead window in circulated tweets and the H1N1 virus. Inspired by length, denoting distance of the time point for predic- tion from T ; β is the lookback window length denoting that we only compute the error between the predicted the number of time points to look back in order to find label values and the actual label values i.e., the nth the regression relation between the case count and the column of the prediction matrix M.
surrogate data.
behind this choice is the fact that unlike traditional We define regression vectors Vt and labels Lt, ∀t = recommender systems we are only concerned with the 1, . . , T as below:.
label column and can sacrifice reconstruction accuracies hPt−β−α, Xt−β−α, Pt+1−β−α, Xt+1−β−α, . . , for other columns.
Pt−α, Xt−αi The lookback window β, the factor size f and the regularization parameter λ1 are estimated using cross-validation and the final prediction for time point T 0 is The regression vector for predicting the case count at time point T 0(T + α > T 0 > T ) is given by equation hPT 0−β−α, XT 0−β−α, Pt+1−β−α, Xt+1−β−α, . . ,P T 0−α, XT 0−αi (NN): For our second class of models, viz.
Under these definitions we describe the models as fol- Vt represents the re- Matrix Factorization Based Regression gression attributes and Lt denote the correspond- (MF): Matrix Factorization is a well accepted tech- Also, let us define the set N (i) Vk is one of the top K nearest neighbors of Vi} predict user preferences from incomplete user rat- where K indicates the maximum number of nearest ings/information. Typically a user-preference ma- neighbors considered. The predicted count b trix is factored into an user-factor and factor-preference time point T 0 is given as: matrix. However, such factorizations are in-cognizant of any temporal continuity. As such to enforce tempo- ral continuity, to predict for the time point T 0(T + α > T 0 > T ) we use the regression vectors and labels as de- Here θk indicates the weight assigned to the kth nearest fined earlier, to define a m × n prediction matrix M, as neighbor. Typically the inverse Euclidean distances to given in equation VT 0 are chosen as the weights.
3.1.3 Matrix Factorization Based Regression using Nearest Neighbor Embedding (MFN): It has been shown in that matrix factorization us- ing nearest neighbor constraints can outperform classi- cal matrix factorization approach as well as traditionalnearest neighbor approaches towards recommender sys- The prediction matrix is factorized into a f × m tems. Drawing inspirations from the result, we modify factor-feature matrix U and a f × n factor-prediction the method to suit the temporal nature of our problem in similar ways as described in section We again define a similar prediction matrix M (see equation Following we define the matrix decomposition rule i,j is the baseline estimate given by: M represents the all-element average and b represents the column wise deviations from the average The key difference between equation and the one and is generally a free-parameter, i.e., it is fitted as is that we don't have any term for part of the optimization problem. U and F matrix are implicit feedback and, further, only the top K neighbors estimated by minimizing the error function: as found through Euclidean distance are used.
∗, F, U = argmin( model is fitted using Eqn as given below: b∗, F, U, x∗ = argmin( 1 is a regularization parameter. An important design criteria in the error function of Eqn is the fact Ensemble Approaches so that the final prediction for the T th data point is In the last section, we described different strategies to correlate a specific source with the ILI case count of a specific country and predict future ILI counts.
The fitting function is given by equation practice, we desire to work with a multitude of data sources and there are two broad ways to accomplish this objective: (a) data level fusion, where a single regressor C b∗, C F, C U, C x∗ = argmin( is constructed from different data sources to the ILI case count, and (b) model level fusion, where we build one regressor for each data source and subsequently combine the predictions from the models. In this section, wedescribe these fusion methods.
Experimental results As before the free parameters are estimated through with both methods are presented in Section Data level fusion: Here we express the feature Forecasting a Moving Target vector X , as a tuple over all the different data sources One of the key challenges in creating a prospective and then proceed with any one of the regression meth- ILI case count predictor is the fact that the official ods as outlined in Section For example, while com- estimates are often delayed and, furthermore, even when bining Twitter and weather data sources (see Fig. published the estimates are revised over a number of the feature vector X is given by: weeks before these become finally stable.
paper, we concentrate on 15 Latin American countries as described in Section and consider the official ILI where Tt and Wt denote attributes derived from Twitter estimates from the Pan American Health Organization and weather, respectively.
(PAHO).Thus we can categorize PAHO count values Model level fusion: In this approach, the mod- downloaded on any week into three different types: (a) els are combined using matrix factorization regression the unknown PAHO counts represented by ¨ with nearest neighbor embedding by comparing the pre- known and stable PAHO counts denoted by ˙ diction estimates from each model with the actual esti- the known and unstable PAHO counts denoted by ˜ mate (since the ground truth can change as well) and the While we desire to predict ¨ Pt, the uncertainty associated average ILI case count for the month for the particular Pt introduces errors in the predictions. In this country (to help organize a baseline). Let us denote the section, we study the effects of such unstable data and average ILI case count for a particular calendar month propose three different models to adjust these unstable I for a given country by: values to more accurate ones.
Figure plots the relative error of an unstable PAHO data series w.r.t. its final estimate, as a function of time. It can be seen that different countries have Considering C different sources and hence C different different stability characteristics: for some countries, models, let us denote the prediction for the tth time PAHO count values are stabilized very slowly whereas point from the cth model by for others they stabilize faster (esp as the number of Using these definitions we can now proceed to updates for a week increases).
Stability behavior of describe the fusion model.
Essentially, the model is PAHO count values were also found to be dependent on similar to the one described in Section where the time of the year as shown in Fig. To plot this the differences can be found in the way we construct curve for Argentina, we categorized any week with less the feature vectors. Similar to Eqn we construct a than 100 cases to belong to a low season, greater than prediction m0 × n0 matrix for fusion given by 300 to be a high season, and the remaining values to be the tth row is represented by equation mid season (the thresholds were different for differentcountries).
At the same time, the PAHO official updates pro- vide an indication of the number of samples used to gen- Then similar to Eqn we factor this matrix into erate the case count estimate. Preliminary experiments latent factors, C U , C F , C b∗ as given by Eqn show that this size is correlated with the accuracy of ILI case counts. In other words, in general, larger values of statistical population size results in smaller relative er- rors for ILI case count. Thus using both the number C Mi,k − µi + C bk )C xk values before and after correction for each countryare shown in Figure While in a few cases, we do not experience any improvement, in countries suchas Argentina and Paraguay, we experience significantimprovements.
Average relative error of PAHO count values with respect to stable values. (a) Comparison between Argentina and Colombia(b) Comparison between different seasons for Argentina.
of samples and the lag in uploading the week data, wecan use machine learning techniques to revise the offi-cially published PAHO estimates. Preliminary resultsshow that for different seasons and different countries,we encounter different stability patterns. Therefore, anyPAHO count adjustment method should be customized Figure 3: Average relative error of PAHO count values before and for seasons and countries separately.
after correction for different countries.
Let us assume that ˙ P is the set of stable PAHO counts for a specific country.
Also, assume that the Finally, similar to Eqn in addition to P sequence of updates for each stable PAHO count value can use only time difference (m) or size of population is available. In other words, for ˙ Pi we have the following ) to correct unstable PAHO values.
these corrections on overall accuracy of predictions are Pi = P (1), P (2), ., P (m), .
explored in Section Experimental Setup is the value of P i after m weeks of update.
After recognizing high, low, and mid-season months Reference Data. In this paper, we focus on for the country, we can categorize each ˙ 15 Latin American countries viz.
to one of these categories.
Then, for category S, an livia, Costa Rica, Colombia, Chile, Ecuador, El Sal- adjustment dataset is constructed named as P S vador, Guatemala, French Guiana, Honduras, Mexico, is defined as follows: Nicaragua, Paraguay, Panama and Peru. We collectedweekly ILI counts from the official Pan American Health ), ., (m, P (m), ˙ Organization (PAHO) w every day from January 2013 Each member of P S is a tuple with four entries: to August 2013. The estimates downloaded every day the first entry denotes the time slot that the sample for each country contain data from January 2010 to belongs to; the second entry is the actual unstable value the latest available week on the day of collection. This of Pi; the third entry is the related stable value; and dataset is stored in a database we refer to as the Tem- is the size of the statistical population for poral Data Repository (TDR). The TDR is also times- tamped so that for any given day, we can readily re- In the next step, a linear regression algorithm is trieve the ILI case counts that were download on that used to adjust unstable PAHO values. In order to adjust day. This is important as historic data may be updated the PAHO values in the mth time slot of season S, we by PAHO even a number of weeks after the first up- date. For the purpose of experimental validation we set to learn a0, a1, a2, and a3 coefficients in the following equation: used the data for the period Jan 2010 to December 2012as the static training set. We considered Wednesdays of the weeks as a reference day within a week. For each Wednesday from Jan 2013 to July 2013, we used the lat- is the adjusted PAHO count value for the est available PAHO data in TDR for that day and pre- mth time slot.
dicted 2 weeks from the last available week for which the Experimental results show that this adjustment PAHO data was available. These predictions are next method results in more accurate known PAHO values.
evaluated against the final ILI case count as downloaded Average relative errors of the published unstable PAHO on September 1, 2013 and we report the performance ofour algorithms in Section Evaluation criteria. We evaluate the prediction shifted to capture the words commonly searched during accuracy of the different algorithms using a modified the tail of the infection. This entire exercise provided us version of percentage relative error: some interesting terms like ginger which has been used as a natural herbal remedy in the eastern world. We also found popular flu medications such as Acemuk and Oseltamivir, which are also sold under the trade name of Tamiflu as highly correlated search queries, especially s and te indicate the starting and the ending time point for which predictions were generated. N particularly for Argentina.
indicates the number of time points over the same time Final filtering. The set of terms obtained from query expansion and correlation analysis were then p = te − ts + 1). Note that the measure is scaled to have values in [0, 4] and the denominator is pruned by hand to obtain a vocabulary of 151 words.
designed to not over-penalize small deviations from the We then performed a final correlation check and re- true ILI case count (e.g., when the true case count is tained a final set of 114 words.
0 and the predicted count is 1). It is to be noted that Google Flu Trends (F ): Google Flu Trends the accuracy metric so defined is non-convex and is in based on and provided by which gives Surrogate data sources. Before describing our weekly and up-to-date ILI case count estimates using data sources in detail, we describe our overall method- search query volumes. Of the countries under consid- ology for organizing a flu-related dictionary (for track- eration, GFT provides weekly estimates for only 6 of ing in multiple media such as news, tweets, and search them viz. Argentina, Bolivia, Chile, Mexico, Peru and Paraguay. These estimates are typically at a different Dictionary creation. The keywords relating scale than the ILI case counts provided by PAHO and to ILI were organized from a seed set of words and therefore need to be scaled accordingly. We collected expanded using a combination of time-series correlation this data weekly on Monday from Jan 2013 to Aug 2013.
analysis and pseudo-query expansion.
(The data downloaded on a particular day contains the of keywords (e.g., gripe) was constructed in Spanish, entire time-series from 2004 to the corresponding week.) Portuguese, and English using feedback from our in- Google Search Trends (S): Google Search house subject matter experts.
Trends (GST, is an- Pseudo-query expansion. Using the seed set, other tool provided by Google. Using this tool we can we crawled the top 20 web sites (according to Google download an estimate of search query volume as a per- Search) associated with each word in this set.
centage over its own temporal history, filtered geograph- also crawled some expert sites such as the official ically. We download the search query volume time series CDC website and equivalent websites of the coun- for the 114 keywords described earlier and convert the tries under consideration, detailing the causes, symp- percentage measures to absolute values using a static toms and treatment for influenza.
dataset we downloaded on Oct 2012 when Google Search crawled a few hand-picked websites such as Trends used to provide absolute query volumes.
Twitter (T ): Twitter data was collected from We filtered the words from and geotagged using an in-house geocoder.
these sites using standard language processing filtering We lemmatized the tweet contents and used language techniques such as stopword removal and Porter stem- detection and POS tagging to help differentiate relevant ming. The filtered set of keywords were then ranked from irrelevant uses of our keywords (e.g., the Spanish according to the absolute frequency of occurrence. The word gripe, meaning flu, is part of our flu keyword top 500 words for Spanish and English were then se- list as opposed to the undesired and unrelated English For example, words such as enfermedad and word ‘gripe'). The resulting analysis yields a weekly pandemia were obtained from this step.
occurrence count of our dictionary in tweets.
Time-series correlation analysis. Next we used HealthMap (H): Similar to Twitter, we also Google Correlate (now a part of Google Trends) to iden- collect flu-related news stories using HealthMap tify keywords most correlated with the ILI case count an online global disease alert sys- time-series for each country. Once again these words tem capturing outbreak data from over 50,000 electronic were found to be a mix of both English and Spanish. As sources. Using this service we receive flu-related news an added step in this process, we also compared time- as a daily feed which is similarly enriched and filtered to shifted ILI counts: left-shifted to capture the words obtain a multivariate time series over lemmatized ver- searched leading up to the actual flu infection and right- sion of the keywords. While Twitter is more suitable to ILI case count prediction accuracy for Mexico using OpenTable data as a single source, and by combining it with all other ascertain general public response, the HealthMap data sources using model level fusion on uncorrected ILI case count data.
provides more detailed information but may capture the trends at a slower rate. Thus each of these sources offers utility in capturing different surrogate signals: Twitter offers leading but noisy indicators whereas HealthMap provides a slightly delayed but more reliable indicator.
OpenTable (O): We also use data on trends of stable estimates of ILI case counts are considered to be restaurant table reservations, initially studied in to the estimates downloaded from PAHO on Oct 1, 2013.
be a potential early indicator for outbreak surveillance, All models considered here were used to forecast 2 weeks as another surrogate for ILI detection. This novel data beyond the latest available PAHO ILI estimates. Key stream is based on the postulate that a higher than findings are presented in Table. We analyze some average number of restaurants with table availability in important observations from this table next.
a region can serve as an indicator of an event of interest,such as increase in flu cases.
Table availability was monitored using OpenTable an online restaurant reservation site with 28,000restaurants at the time of this writing. Daily searcheswere performed starting from September 2012 for atable for two persons at lunch and dinner; between12:30-3pm, and between 6-10:30pm. Data was collectedfor Mexico by city (Cancun, Mexico City, Puebla,Monterrey, and Guadalajara) and for the entire country.
The daily proportion (proportion used due to changes inthe number of restaurants in the system) of restaurants Figure 4: Accuracy of different methods for each country.
with available tables was aggregated as a weekly time- Can we ‘beat' Google Flu Trends with our custom dictionary? The key difference between Weather (W): All of the previously described Google Flu Trends (which can be considered as a base data sources can be termed as non-physical indicators rate) and Google Search Trends is that the former uses a which can work suitably as indirect indicators about the closed dictionary whereas we constructed the dictionary state of the population with respect to flu by exposing to use with GST. As can be seen Table for majority different population characteristics. On the other hand, of the common countries (countries for which data from meteorological data can be considered a more direct both GST and GFT is present), regressors running on and physical driver of influenza transmission It GST consistently outperform those running on GFT has been shown in that absolute humidity (with Mexico and Peru being the exception). Thus we can be directly used to predict the onset of influenza posit that the GST model devised here is a sufficiently epidemics. Here, we collect several other meteorological close approximation to GFT, with the added advantages indicators such as temperature and rainfall in addition of having access to raw level data and being available to humidity from the Global Data Assimilation System for more countries than GFT (only 6 of the 15 countries (GDAS). We accessed this data in GRIB format from we consider are present in the GFT database).
at a resolution of Which is the optimal regression model? From Ta- 1o lat/long interval.
However, looking at all the ble we can also analyze the three different regressors lat/long for a country can often lead to noisy data.
proposed in Section with respect to overall accuracy.
As such we filtered the downloaded data and used the With respect to each individual source, we can see that indicators only around the surveillance centers. Finally, matrix factorization with nearest neighbor embedding we generated a times-series by using weekly-averages (MFN) performs the best in average over the countries.
of this date, for each country. We collected this data For some countries such as Panama, when using only weekly from Jan 2013 to August 2013.
GST, MFN performs poorer than vanilla MF; neverthe- less the average accuracy over all countries for any given In this section, we present an exhaustive set of experi- data source is best when using MFN.
ments evaluating our algorithms over 6 months of pre- Which is the best strategy to combine multiple dictions from Jan 2013 to August 2013. The final and As shown in Table in overall, model level fusion works better than data level fusion.
Comparing forecasting accuracy of models using individual sources. Scores in this and other tables are normalized to [0,4] so that 4 is the most accurate.
Table 2: Comparison of prediction accuracy while combining all data sources and using MFN regression.
Comparison of prediction accuracy while using model level fusion on MFN regressors and employing PAHO stabilization.
Discovering importance of sources in Model level fusion on MFN regressors by ablating one source at a time.
For 8 of the 15 countries, model level fusion works wherein we remove one data source at a time from our appreciably better than data level fusion, while the model level MFN fusion framework and contrast accu- reverse trend is seen for 4 other countries.
racies. While removing the weather data degrades the showcases the importance of considering both kinds of accuracy score the most, removing the social indicators fusion depending on the country of interest.
also degrades the score to varying degrees. Thus we How effective are we at forecasting a moving posit that it is important to consider both the physical PAHO target? As shown in Table our corrected and social indicators to get a refined signal about the estimates using both the number of samples and the prevalent ILI incidence in the population.
weeks ahead from the upload date are generally better.
How relevant is restaurant reservation data to It is instructive to note that our correction strategy forecasting ILI? All the results thus far do not con- is able to increase the overall accuracy only by a sider the OpenTable reservation data, since this source score of approximately 0.05 over all the countries, for is available only for Mexico (among the countries stud- some countries such as Mexico and Argentina (for ied here). We considered table availability for different which the data update is typically noisy) we obtain a time ranges and compared performance using our MFN substantial improvement of scores. This suggests that model. As Table demonstrates, we obtain the best the correction strategy may be selectively applied when performance when considering both lunch and dinner forecasting for certain countries.
reservation data. Nevertheless, we have observed that How do physical vs social indicators fare against including this source as part of the ensemble decreases each other? From Table we see that the data source the overall accuracy by 0.01 over the uncorrected ILI with the best single accuracy happens to be the physical case count data. Thus it is our opinion that although indicator source, i.e., weather data. However, Table the reservation data could exhibit some signals about conveys a mixed story. Here we conduct an ablation test, prevalent ILI conditions, it is also a surrogate for non- health conditions (e.g., social unrest), which must be [4] K. Lee, A. Agrawal, and A. Choudhary, "Real-time factored out to make the data source more useful.
disease surveillance using twitter data: demonstration Finally, we present Figure where we compare on flu and cancer," in Proceedings of the KDD '13, for each country the accuracies of prediction from the 2013, pp. 1474–1477.
best individual source, with those from both data level [5] N. Kanhabua and W. Nejdl, "Understanding the diver- sity of tweets in the time of outbreaks," in Proceedings and model level fusion of the different sources and the of WWW '13, 2013, pp. 1335–1342.
the model level fusion of MF regressors applied on the [6] C. Chew and G. Eysenbach, "Pandemics in the age corrected PAHO estimates rather than the raw ones.
of twitter: Content analysis of tweets during the 2009 As can be seen, we progressively increase our accuracies h1n1 outbreak," PlosOne, vol. 5, no. 11, p. e14118, with the corrected PAHO estimates providing the final increase in predictive power to our model level fusion [7] R. Sugumaran and J. Voss, "Real-time spatio-temporal analysis of west nile virus using twitter data," in Conclusions and Further Work Proceedings of COM.Geo '12, 2012, pp. 1335–1342.
[8] J. D. Tamerius, J. Shaman, W. J. Alonso, K. Bloom- To forecast ILI over a range of Latin American coun- Feshbach, C. K. Uejio, A. Comrie, and C. Viboud, tries, we have explored a gamut of options pertaining "Environmental predictors of seasonal influenza epi- to data sources, fusion possibilities, and corrections to demics across temperate and tropical climates," PLoS track a moving target. Our results demonstrate that pathogens, vol. 9, no. 3, p. e1003194, 2013.
there are significant opportunities to improve forecast- [9] J. Shaman, E. Goldstein, and M. Lipsitch, "Absolute ing performance and selective superiorities among data Humidity and Pandemic Versus Epidemic Influenza," sources that can be leveraged. Our future work focuses American journal of epidemiology, vol. 173, no. 2, pp.
on reconciling the phenomenological models developed 127–135, 2010.
here with true epidemiological models to that we can [10] J. Shaman, V. E. Pitzer, C. Viboud, B. T. Grenfell, and M. Lipsitch, "Absolute humidity and the seasonal develop not just near-term forecasts as done here but onset of influenza in the continental United States." also identify long-range characteristics of the epidemic PLoS biology, vol. 8, no. 2, p. e1000316, 2010.
as it unfolds. We also aim to explore the inter-country [11] P. Kostkova, "A roadmap to integrated digital public characteristics of ILI profiles in future.
health surveillance: the vision and the challenges," in Proceedings of WWW '13, 2013, pp. 687–694.
Supported by the Intelligence Advanced Research [12] E. O. Nsoesie, J. S. Brownstein, N. Ramakrishnan, and M. Marathe, "A systematic review of studies Projects Activity (IARPA) via Department of Interior on forecasting the dynamics of influenza outbreaks," National Business Center (DoI/NBC) contract number Influenza and other respiratory viruses, 2013.
D12PC000337 and by the Defense Threat Reduction [13] M. Marathe and N. Ramakrishnan, "Recent advances agency (DTRA) via the CNIMS Contract HDTRA1-11- in computational epidemiology," IEEE Intelligent Sys- D-0016-0001. The US Government is authorized to re- tems, vol. 28, no. 4, pp. 0096–101, 2013.
produce and distribute reprints of this work for Govern- [14] J. Canny, "Collaborative filtering with privacy via mental purposes notwithstanding any copyright anno- factor analysis," in Proceedings of SIGIR '02, 2002, pp.
tation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should [15] Y. Koren, "Factorization meets the neighborhood: a not be interpreted as necessarily representing the official multifaceted collaborative filtering model," in Proceed- policies or endorsements, either expressed or implied, of ings of KDD '08, 2008, pp. 426–434.
[16] J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, IARPA, DoI/NBC, DTRA, or the US Government.
M. S. Smolinski, and L. Brilliant, "Detecting influenza epidemics using search engine query data," Nature, vol.
457, no. 7232, pp. 1012–1014, 2008.
[1] Q. Yuan, E. O. Nsoesie, B. Lv, G. Peng, R. Chunara, [17] E. O. Nsoesie, D. L. Buckeridge, and J. S. Brownstein, and J. S. Brownstein, "Monitoring Influenza Epidemics "Who's not coming to dinner? evaluating trends in on- in China with Search Query from Baidu," PlosOne, line restaurant reservations for outbreak surveillance," vol. 8, no. 5, p. e64323, 2013.
Online Journal of Public Health Informatics, vol. 5, [2] J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Bram- no. 1, 2013.
mer, M. S. Smolinski, and L. Brilliant1, "Influenza epi- [18] W. Yang, S. Elankumaran, and L. C. Marr, "Rela- demics using search engine query data," Nature, vol.
tionship between humidity and influenza A viability in 457, no. 7232, pp. 1012–1014, 2009.
droplets and implications for influenza's seasonality." [3] K. Denecke, P. Dolog, and P. Smrz, "Making use of PloS one, vol. 7, no. 10, p. e46789, 2012.
social media data in public health," in Proceedings ofWWW '12, 2012, pp. 243–246.


The fair labor standards act exemptions and the pharmaceuticals industry: are sales representatives entitled to overtime?

The Fair Labor Standards Act Exemptions and the Pharmaceuticals Industry: Are Sales Representatives Entitled to Overtime?Steven I. Locke Follow this and additional works at: Part of the nd the Recommended CitationSteven I. Locke (2009) "The Fair Labor Standards Act Exemptions and the Pharmaceuticals Industry: Are Sales RepresentativesEntitled to Overtime?," Barry Law Review: Vol. 13: Iss. 1, Article 1.Available at:

ADVANCES IN NEUROPSYCHIATRY Neuropsychiatry of the basal ganglia J Neurol Neurosurg Psychiatry 2002;72:12–21 This review aims to relate recent findings describing the parts of the basal ganglia closest to limbic role and neural connectivity of the basal ganglia to the structures and that are involved in cognitive and clinical neuropsychiatry of basal ganglia movement