|

Medical Care

##SEVER##

/c/cs.zju.edu.cn1.html

Cs.zju.edu.cn

Optimal Power Allocation in Server Farms
Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA, USA Pittsburgh, PA, USA Hawthorne, NY, USA improve server farm performance, by a factor of typically Server farms today consume more than 1.5% of the total 1.4 and as much as a factor of 5 in some cases.
electricity in the U.S. at a cost of nearly $4.5 billion. Giventhe rising cost of energy, many industries are now seeking Categories and Subject Descriptors solutions for how to best make use of their available power.
C.4 [Performance of Systems]: Modeling techniques An important question which arises in this context is howto distribute available power among servers in a server farmso as to get maximum performance.
By giving more power to a server, one can get higher server Theory, Experimentation, Measurement, Performance frequency (speed). Hence it is commonly believed that, fora given power budget, performance can be maximized by operating servers at their highest power levels. However,it is also conceivable that one might prefer to run servers Servers today consume ten times more power than they at their lowest power levels, which allows more servers to be did ten years ago [3, 21]. Recent articles estimate that a turned on for a given power budget. To fully understand the 300W high performance server requires more than $330 of effect of power allocation on performance in a server farm energy cost per year [24]. Given the large number of servers with a fixed power budget, we introduce a queueing theo- in use today, the worldwide expenditure on enterprise power retic model, which allows us to predict the optimal power and cooling of these servers is estimated to be in excess of allocation in a variety of scenarios. Results are verified via $30 billion [21].
extensive experiments on an IBM BladeCenter.
Power consumption is particularly pronounced in CPU We find that the optimal power allocation varies for differ- intensive server farms composed of tens to thousands of ent scenarios. In particular, it is not always optimal to run servers, all sharing workload and power supply. We consider servers at their maximum power levels. There are scenar- server farms where each incoming job can be routed to any ios where it might be optimal to run servers at their lowest server, i.e., it has no affinity for a particular server.
power levels or at some intermediate power levels. Our anal- Server farms usually have a fixed peak power budget. This ysis shows that the optimal power allocation is non-obvious is because large power consumers operating server farms are and depends on many factors such as the power-to-frequency often billed by power suppliers, in part, based on their peak relationship in the processors, the arrival rate of jobs, the power requirements. The peak power budget of a server maximum server frequency, the lowest attainable server fre- farm also determines its cooling and power delivery infras- quency and the server farm configuration. Furthermore, our tructure costs. Hence, companies are interested in maxi- theoretical model allows us to explore more general settings mizing the performance at a server farm given a fixed peak than we can implement, including arbitrarily large server power budget [4, 8, 9, 21].
farms and different power-to-frequency curves. Importantly, The power allocation problem we consider is: how to dis- we show that the optimal power allocation can significantly tribute available power among servers in a server farm soas to minimize mean response time. Every server running a ∗Research supported by NSF SMA/PDOS Grant CCR- given workload has a minimum level of power consumption, 0615262 and a 2009 IBM Faculty Award.
b, needed to operate the processor at the lowest allowablefrequency and a maximum level of power consumption, c,needed to operate the processor at the highest allowable fre-quency. By varying the power allocated to a server within Permission to make digital or hard copies of all or part of this work for the range of b to c Watts, one can proportionately vary the personal or classroom use is granted without fee provided that copies are server frequency (See Fig. 2). Hence, one might expect that not made or distributed for profit or commercial advantage and that copies running servers at their highest power levels of c Watts, bear this notice and the full citation on the first page. To copy otherwise, to which we refer to as PowMax, is the optimal power allocation republish, to post on servers or to redistribute to lists, requires prior specific scheme to minimize response time. Since we are constrained permission and/or a fee.
SIGMETRICS/Performance'09, June 15–19, 2009, Seattle, WA, USA.
by a power budget, there are only a limited number of servers Copyright 2009 ACM 978-1-60558-511-6/09/06 .$5.00.
that we can operate at the highest power level. The rest of the servers remain turned off. Thus PowMax corresponds to age and Frequency Scaling) and DVFS+DFS. Section 2 dis- having few fast servers. In sharp contrast is PowMin, which cusses these mechanisms in more detail. The functional we define as operating servers at their lowest power levels of form of the power-to-frequency relationship for a server de- b Watts. Since we spend less power on each server, PowMin pends on many factors such as the workload used, maxi- corresponds to having many slow servers. Of course there mum server power, maximum server frequency and the volt- might be scenarios where we neither operate our servers at age and frequency scaling mechanism used (DFS, DVFS the highest power levels nor at the lowest power levels, but or DVFS+DFS). Unfortunately, the functional form of the we operate them at some intermediate power levels. We server level power-to-frequency relationship is only recently refer to such power allocation schemes as PowMed.
beginning to be studied (See the 2008 papers [21, 25]) and is Understanding power allocation in a server farm is intrin- still not well understood. Our first contribution is the inves- sically difficult for many reasons: First, there is no single tigation of how power allocation affects server frequency in a allocation scheme which is optimal in all scenarios. For ex- single server using DFS, DVFS, and DVFS+DFS for various ample, it is commonly believed that PowMax is the optimal workloads. In particular, in Section 3 we derive a functional power allocation scheme [1, 7]. However, as we show later, form for the power-to-frequency relationship based on our PowMin and PowMed can sometimes outperform PowMax measured values (See Figs. 2(a) and (b)).
by almost a factor of 1.5. Second, it turns out that the Our second contribution is the development of a queue- optimal power allocation depends on a very long list of ex- ing theoretic model which predicts the mean response time ternal factors, such as the outside arrival rate, whether an for a server farm as a function of many factors including open or closed workload configuration is used, the power- the power-to-frequency relationship, arrival rate, peak power to-frequency relationship (how power translates to server budget, etc. The queueing model also allows us to determine frequency) inherent in the technology, the minimum power the optimal power allocation for every possible configuration consumption of a server (b Watts), the maximum power that of the above factors (See Section 4).
a server can use (c Watts), and many other factors. It is sim- Our third contribution is the experimental implementa- ply impossible to examine all these factors via experiments.
tion of our schemes, PowMax, PowMin, and PowMed, on To fully understand the effect of power allocation on mean an IBM BladeCenter, and the measurement of their response response time in a server farm with a fixed power budget, time for various workloads and voltage and frequency scal- we introduce a queueing theoretic model, which allows us to ing mechanisms (See Sections 5 and 6). Importantly, our predict the optimal power allocation in a variety of scenarios.
experiments show that using the optimal power allocation We then verify our results via extensive experiments on an scheme can significantly reduce mean response time, some- IBM BladeCenter.
times by as much as a factor of 5. To be more concrete, we Prior work in power management has been motivated by show a subset of our results in Fig. 1, which assumes a CPU the idea of managing power at the global data center level [8, bound workload in an open loop setting. Fig. 1(a) depicts 20] rather than at the more localized single-server level.
one possible scenario using DFS where PowMax is optimal.
While power management in server farms often deals with By contrast, Fig. 1(b) depicts a scenario using DVFS where various issues such as reducing cooling costs, minimizing idle PowMin is optimal for high arrival rates. Lastly, Fig. 1(c) power wastage and minimizing average power consumption, depicts a scenario using DVFS+DFS where PowMed is op- we are more interested in the problem of allocating peak timal for high arrival rates.
power among servers in a server farm to maximize perfor- We experiment with different workloads. While Section 5 mance. Notable prior work dealing with peak power alloca- presents experimental results for a CPU bound workload, tion in a server farm includes Raghavendra et al. [21], Femal LINPACK, Section 6 repeats all the experiments under the et al. [10] and Chase et al. [5] among others. Raghavendra STREAM memory bound workload, the WebBench web work- et al. [21] present a power management solution that co- load, and other workloads. In all cases, experimental results ordinates different individual approaches to simultaneously are in excellent agreement with our theoretical predictions.
minimize average power, peak power and idle power wastage.
Section 7 summarizes our work. Finally, Section 8 discusses Femal et al. [10] allocate peak power so as to maximize future applications of our model to more complex situations throughput in a data center while simultaneously attempt- such as workloads with varying arrival rates, servers with ing to satisfy certain operating constraints such as load- idle (low power) states and power management at the sub- balancing the available power among the servers. Chase et system level, such as the storage subsystem.
al. [5] present an auction-based architecture for improvingthe energy efficiency of data centers while achieving some 2. EXPERIMENTAL FRAMEWORK quality-of-service specifications. We differ from the abovework in that we specifically deal with minimizing mean re- 2.1 Experimental setup sponse time for a given peak power budget and understand- Our experimental setup consists of a server farm with up ing all the factors that affect it.
to fourteen IBM BladeCenter HS21 blade servers featuring Our contributions two 3.0 GHz dual-core Intel Woodcrest Xeon processors and As we have stated, the optimal power allocation scheme 1 GB memory per blade, all residing in a single chassis. We depends on many factors. Perhaps the most important of installed and configured Apache as an application server on these is the specific relationship between the power allocated each of the blade servers to process transactional requests.
to a server and its frequency (speed), henceforth referred To generate HTTP requests for the Apache web servers, we to as the power-to-frequency relationship. There are sev- employ an additional blade server on the same chassis as eral mechanisms within processors that control the power- the workload generator to reduce the effects of network la- to-frequency relationship. These can be categorized into tency. The workload generator uses the web server perfor- DFS (Dynamic Frequency Scaling), DVFS (Dynamic Volt- mance benchmarking tool httperf [19] in the open server PowMax beats PowMin PowMin beats PowMax PowMax beats PowMin PowMax beats PowMin Mean response time (sec) Mean response time (sec) High arrival rate High arrival rate (a) PowMax is best.
(b) PowMin is best (c) PowMed is best at high arrival rates.
at high arrival rates.
Figure 1: Subset of our results, showing that no single power allocation scheme is optimal. Fig.(a) depictsa scenario using DFS where PowMax is optimal. Fig.(b) depicts a scenario using DVFS where PowMin isoptimal at high arrival rates whereas PowMax is optimal at low arrival rates. Fig.(c) depicts a scenario usingDVFS+DFS where PowMed is optimal at high arrival rates whereas PowMax is optimal at low arrival rates. farm configuration and wbox [23] in the closed server farm When allocating power to a server, there is a minimum configuration. We modified and extended httperf and wbox level of power consumption (b) needed to operate the proces- to allow for multiple servers and to specify the routing prob- sor at the lowest allowable frequency and a maximum level ability among the servers. We measure and allocate power to of power consumption (c) needed to operate the processor the servers using IBM's Amester software. Amester, along at the highest allowable frequency (Of course the specific with additional scripts, collects all relevant data for our ex- values of b and c depend on the application that the server is running). Formally, we define the following notation:Baseline power: b Watts The minimum power consumed 2.2 Voltage and frequency scaling mechanisms by a fully-utilized server over the allowable range of proces- Processors today are commonly equipped with mecha- sor frequency.
nisms to reduce power consumption at the expense of re- Speed at baseline: sb Hertz The speed (or frequency) of duced server frequency. Common examples of these mecha- a fully utilized server running at b Watts.
nisms are Intel's SpeedStep Technology and AMD's Cool ‘n' Maximum power: c Watts The maximum power con- Quiet Technology. The power-to-frequency relationship in sumed by a fully-utilized server over the allowable range of such servers depends on the specific voltage and frequency scaling mechanism used. Most mechanisms fall under the Speed at maximum power: sc Hertz The speed (or fre- following three categories: quency) of a fully utilized server running at c Watts.
Dynamic Frequency Scaling (DFS) a.k.a. Clock Throttling or T-states is a technique to manage powerby running the processor at a less-than-maximum clock fre-quency. Under DFS, the Intel's 3.0 GHz Woodcrest Xeon 3. POWER-TO-FREQUENCY processors we use allow for 8 operating points which corre- An integral part of determining the optimal power allo- spond to effective frequencies of 12.5%, 25%, 37.5%, 50%, cation is to understand the power-to-frequency relationship.
62.5%, 75%, 87.5%, and 100% of the maximum server fre- This relationship differs for DFS, DVFS, and DVFS+DFS, and also differs based on the workload in use. Unfortunately, Dynamic Voltage and Frequency Scaling (DVFS) the functional form of the power-to-frequency relationship is a.k.a. P-states is a more efficient power-savings mech- not well studied in the literature. The servers we use sup- anism that reduces server frequency by reducing the pro- port all three voltage and frequency scaling mechanisms and cessor voltage and frequency. Under DVFS, our processors therefore can be used to study the power-to-frequency rela- allow for 4 operating points which correspond to effective tionships. In this section, we present our measurements de- frequencies of 66.6%, 77.7%, 88.9%, and 100% of the maxi- picted in Figs. 2(a) and (b) showing the functional relation- mum server frequency.
ship between power allocated to a server and its frequency DVFS+DFS attempts to leverage both DVFS, and DFS for the LINPACK [13] workload. We generalize the func- by applying DFS on the lowest performance state available tional form of the power-to-frequency relationship to other in DVFS. Under DVFS+DFS, our processors allow for 11 workloads in Section 6 (See Figs. 8 and 10). Throughout we operating points which correspond to effective frequencies assume a homogeneous server farm.
of 8.3%, 16.5%, 25%, 33.3%, 41.6%, 50.0%, 58.3%, 66.6%, We use the tools developed in [17] to limit the maxi- 77.7%, 88.9%, and 100% of the maximum server frequency.
mum power allocated to each server. Limiting the maximumpower allocated to a server is usually referred to as capping 2.3 Power consumption within a single server the power allocated to a server. We run LINPACK jobs back-to-back to ensure that the server is always occupied by power-to-frequency relationship to be cubic, especially the workload, and that the server is running at the specified for the processor. By using a cubic model for DVFS+DFS, power cap value. Hence, the power values we observe will we wish to analyze the optimal power allocation policy be the peak power values for the specified workload. Recall for those settings.
from Section 2 that the voltage and frequency scaling mech-anisms have certain discrete performance points (in terms of Approximating DVFS+DFS using a cubic fit gives the fol- frequency) at which the server can operate. At each of these lowing relationship between the speed of a server and the performance points, the server consumes a certain amount power allocated to it: of power for a given workload. By quickly dithering between s = sb + α0 3 P − b. available performance states, we can ensure that the server never consumes more than the set power cap value. In this Specifically, our experiments show that α0 = 0.39 GHz/ 3 W .
way, we also get the best performance from the server for Also, from our experiments, we found that b = 150W, c = the given power cap value. Note that when we say power, 250W, sb = 1.2 GHz and sc = 3 GHz for DVFS+DFS.
we mean the system-level power which includes the powerconsumed by the processor and all other components within 4. THEORETICAL RESULTS Fig. 2(a) illustrates the power-to-frequency curves obtained The optimal power allocation depends on a large num- for LINPACK using DFS and DVFS. From the figure, we see ber of factors including the power-to-frequency relationship that the power-to-frequency relationship for both DFS and just discussed in Fig. 2, the arrival rate, the minimum and DVFS is almost linear. It may seem surprising that the maximum power consumption levels (b and c respectively), power-to-frequency relationship for DVFS looks like a lin- whether the server farm has an open loop configuration or ear plot. This is opposite to what is widely suggested in a closed loop configuration, etc.
the literature for the processor power-to-frequency relation- In order to investigate the effects of all these factors on ship, which is cubic [11]. The reason why the server power- the optimal power allocation, we develop a queueing the- to-frequency relationship is linear can be explained by two oretic model which predicts the mean response time as a interrelated factors. First, manufacturers usually settle on function of all the above factors (See Section 4.1). We then a limited number of allowed voltage levels (or performance produce theorems that determine the optimal power allo- states), which results in a less-than-ideal relationship be- cation for every possible configuration of the above factors, tween power and frequency in practice. Second, DVFS is including open loop configuration (See Theorems 1 and 2 in not applied on many components at the system level. For Section 4.2) and closed loop configuration (See Theorems 3, example, power consumption in memory remains propor- 4 and 5 in Section 4.3).
tional to the number of references to memory per unit time, 4.1 Queueing model which is only linearly related to the frequency of the pro-cessor. Thus, the power-to-frequency curve for both DFSand DVFS can be approximated as a linear function. Usingthe terminology introduced in Section 2, we approximatethe server speed (or frequency), s GHz, as a function of thepower allocated to it, P Watts, as: s = sb + α(P − b). where the coefficient α (units of GHz per Watt) is the slopeof the power-to-frequency curve. Specifically, our experi-ments show that α = 0.008 GHz/W for DVFS and α = 0.03GHz/W for DFS. Also, from our experiments, we find thatb = 180W and c = 240W for both DVFS and DFS. How-ever, sb = 2.5 GHz for DVFS and sb = 1.2 GHz for DFS.
The maximum speed in both cases is sc = 3 GHz, which issimply the maximum speed of the processor we use. Notethat the specific values of these parameters change depend-ing on the workload in use. Section 6 discusses our measuredparameter values for different workloads.
For DVFS+DFS, we expect the power-to-frequency rela- tionship to be piecewise linear since it is a combination ofDVFS and DFS. Experimentally, we see from Fig. 2(b) thatthe power-to-frequency relationship is in fact piecewise lin-ear (3 GHz - 2.5 GHz and then 2.5 GHz - 1.2 GHz).
Figure 3: Illustration of our k-server farm model Though we could use a piecewise linear fit for DVFS+DFS, we choose to approximate it using a cubic curve for the fol-lowing reasons: Fig. 3 illustrates our queueing model for a server farm 1. Using a cubic fit demonstrates how we can extend our with k servers. We assume that there is a fixed power bud- results to non-linear power-to-frequency relationships.
get P, which can be split among the k servers, allocating i power to server i where i = P . The correspond- 2. As previously mentioned, several papers consider the ing server speeds are denoted by (s1, . . , sk). Each server, Figure 2: Power-to-frequency curves for DFS, DVFS, and DVFS+DFS for the CPU bound LINPACK workload.
Fig.(a) illustrates our measurements for DFS and DVFS. In both these mechanisms, we see that the serverfrequency is linearly related to the power allocated to the server. Fig.(b) illustrates our measurements forDVFS+DFS, where the power-to-frequency curve is better approximated by a cubic relationship. i, receives a fraction qi of the total workload coming in to timal power allocation is non-trivial, computing E[T ] for a the server farm. Corresponding to any vector of power allo- given allocation is easy. Hence we omit showing the mean cation (P1, . . , Pk), there exists an optimal workload allo- response time in each case and refer the reader to the ap- cation vector (q∗1, . . , q∗k). We derive the optimal workload pendix. Due to lack of space, we defer all proofs to the allocation for each power allocation and use that vector, appendix. However, we present the intuition behind the (q∗1, . . , q∗k), both in theory and in the actual experiments.
theorems in each case. Recall from Section 2 that each fully The details of how we obtain the optimal (q∗1, . . , q∗k) are utilized server has a minimum power consumption of b Watts deferred to the appendix.
and maximum power consumption of c Watts.
Our model assumes that the jobs at a server are scheduled To illustrate our results clearly, we shall assume through- using the Processor-Sharing (PS) scheduling discipline. Un- out this section (and the Appendix) that the power budget der PS, when there are n jobs at a server, they each receive P is such that PowMax allows us to run n servers (each at 1/nth of the server capacity. PS is identical to Round-Robin power c) and PowMin allows us to run m servers (each at with quantums (as in Linux), when the quantum size ap- power b). This is equivalent to saying: proaches zero. A job's response time, T , is the time from P = m · b = n · c when the job arrives until it has completed service, includ-ing waiting time. We aim to minimize mean response time, where m and n are less than or equal to k. Obviously, m ≥ n.
We will analyze our server farm model under both an open 4.2 Theorems for open loop configurations loop configuration (See Section 4.2) and a closed loop con- Theorem 1 derives the optimal power allocation in an open figuration (See Section 4.3). An open loop configuration is loop configuration for a linear power-to-frequency relation- one in which jobs arrive from outside the system and leave ship, as is the case for DFS and DVFS. In such cases, the the system after they complete service. We assume that server frequency varies with the power allocated to it as the arrival process is Poisson with average rate λ jobs/sec.
si = sb + α(Pi − b). The theorem says that if the speed Sometimes it will be convenient to, instead, express λ in at baseline, sb, is sufficiently low, then PowMax is optimal.
units of GHz. This conversion is easily achievable since an By contrast, if sb is high, then PowMin is optimal for high average job has size E[S] gigacycles. In the theorems pre- arrival rates and PowMax is optimal for low arrival rates. If sented in the paper, λ is in the units of GHz. However, in the s∗i is the speed of server i when run at power P∗i, then the appendix, when convenient for queueing analysis, we switch stability condition requires that λ < to jobs/sec. Likewise, while it is common for us to express Theorem 1. Given an open k-server farm configuration the speed of the server, s, in GHz, we sometimes switch with a linear power-to-frequency relationship (given by Eq. (1)) to jobs/sec in the appendix when convenient. A closed loop and power budget P, the following power allocation mini- configuration is one in which there are always a fixed number mizes E[T ]: of users N (also referred to as the multi-programming level) If sb ≤ α: who each submit one job to the server. Once a user's job is 1,2,.,n = c, P ∗ n+1,n+2,.,k = 0 completed, he immediately creates another job, keeping the If sb > α: 1,2,.,n = c, P ∗ n+1,n+2,.,k = 0 if λ ≤ λlow number of jobs constant at N .
P∗1,2,.,m = b, P∗m+1,m+2,.,k = 0 if λ > λlow In all of the theorems that follow, we find the optimal where λlow = α · P. power allocation, (P∗1, P∗2, . . , P∗k), for a k-server farm, which Corollary 1. For DFS, PowMax is optimal. For DVFS, minimizes the mean response time, E[T ], given the fixed PowMax is optimal at low arrival rates and PowMin is op- peak power budget P = i . While deriving the op- timal at high arrival rates. Intuition For a linear power-to-frequency relationship, by Eqs. (1) and (2)), the following power allocation mini- we have from Eq. (1) that the speed of a server, si, varies mizes E[T ] for low N , based on the asymptotic approxima- with the power allocated to it, Pi, as si = sb + α(Pi − b).
tions in [14]: From this equation, it follows that the frequency per Wattfor a single server, si , can be written as: P∗1,2,.,n = c, P∗n+1,n+2,.,k = 0 = b − αb + α. Corollary 3. For a closed-loop server farm configura- tion with low N , PowMax is optimal for DFS, DVFS, and Hence, maximizing the frequency per Watt depends on whether sb ≤ αb or sb > αb. If sb ≤ αb, maximizing si is equivalent to maximizing P Intuition When N is sufficiently low, there are very few i, which is achieved by PowMax. Alterna- tively, if s jobs in the system. Hence few fast servers are optimal since b > αb, we want to minimize Pi, which is achieved by PowMin. However, the above argument still does not take there aren't enough jobs to utilize the servers, leaving servers into account the mean arrival rate, λ. If λ is sufficiently low, idle. Thus PowMax is optimal. This is similar to the case of there are very few jobs in the server farm, hence, few fast low arrival rate that we considered for an open loop server servers, or PowMax, is optimal. The corollary follows by farm configuration in Theorems 1 and 2.
simply plugging in the values of s When N is high, the optimal power allocation is non- b, α and b for DFS and DVFS from Section 3.
trivial. From here on, we assume N is large enough to keep Theorem 2 derives the optimal power allocation for non- all servers busy, so that asymptotic bounds apply (See [14]).
linear power-to-frequency relationships, such as the cubic In our experiments, we find that N > 10k suffices.
relationship in the case of DVFS+DFS. In such cases, the Theorem 4 says that for high N , if the speed at baseline, server frequency varies with the power allocated to it as b, is sufficiently low, then PowMax is optimal. By contrast, if sb is high, then PowMin is optimal.
i = sb + α0 3 Pi − b. The theorem says that if the arrival rate is sufficiently low, then PowMax is optimal. However, if Theorem 4. Given a closed k-server farm configuration the arrival rate is high, PowMed is optimal. Although The- with a linear power-to-frequency relationship (given by Eq. (1)), orem 2 specifies a cubic power-to-frequency relationship, we the following power allocation minimizes E[T ] for high N , conjecture that similar results hold for more general power- based on the asymptotic approximations in [14]: to-frequency curves where server frequency varies as the n-th If sb < α: root of the power allocated to the server.
1,2,.,n = c, P∗n+1,n+2,.,k = 0 If sb ≥ α: 1,2,.,m = b, P∗m+1,m+2,.,k = 0 Theorem 2. Given an open k-server farm configuration with a cubic power-to-frequency relationship (given by Eq. (2)) Corollary 4. For DFS, PowMax is optimal for high N . and power budget P, the following power allocation mini- For DVFS, PowMin is optimal for high N . mizes E[T ]: Intuition For a closed queueing system with zero think P∗1,2,.,n = c, P∗n+1,n+2,.,k = 0 if λ ≤ λ0low . time, the mean response time is inversely proportional to P∗1,2,.,l = P , l+1,l+2,.,k = 0 if λ > λ0low % the throughput of the system. Hence, to minimize the mean response time, we must maximize the throughput of the k- low = nlα0 c − b − 3 − b and l = server farm with power budget P. When N is high, under b+ α0P a server farm configuration, all servers are busy. Hence the Corollary 2. For DVFS+DFS, PowMax is optimal at throughput is the sum of the server speeds. It can be easily low arrival rates and PowMed is optimal at high arrival shown that the throughput of the system under PowMin, sb ·m, exceeds the throughput of the system under PowMax,sc · n, when sb ≥ αb. Hence the result.
Intuition When the arrival rate is sufficiently low, there Theorem 5 deals with the case of high N for a non-linear are very few jobs in the system, hence, PowMax is optimal.
power-to-frequency relationship. The theorem says that if However, for higher arrival rates, we allocate to each server the speed at baseline, sb, is sufficiently low, then PowMax is the amount of power that maximizes its frequency per Watt optimal. By contrast, if sb is high, then PowMed is optimal.
ratio. For the cubic power-to-frequency relationship, whichhas a downwards concave curve (See Fig. 2(b)), we find that Theorem 5. Given a closed k-server farm configuration the optimal power allocation value for each server (derived with a cubic power-to-frequency relationship (given by Eq. (2)), via calculus) lies between the maximum c and the minimum the following power allocation minimizes E[T ] for high N , b. Hence, PowMed is optimal.
based on the asymptotic approximations in [14]: 4.3 Theorems for closed loop configurations b < s0: P∗1,2,.,n = c, P∗n+1,n+2,.,k = 0 If sb ≥ s0: P∗1,2,.,l = b + x, P∗l+1,l+2,.,k = 0 We now move on to closed loop configurations, where the , s0 = msc − α0 3 x and x is the non- number of jobs in the system, N , is always constant. We will rely on asymptotic operational laws (See [14]) which negative, real solution of the equation b = 2x + 1 α0 (3x approximate the performance of the system for very high Nand very low N (see Appendix).
Theorem 3 says that for a closed server farm configuration Corollary 5. For DVFS+DFS, for high N , PowMed is with sufficiently low value of N , PowMax is optimal.
optimal if sb is high, else PowMax is optimal. Theorem 3. Given a closed k-server farm configuration Intuition As in the case of Theorem 4, we wish to maximize with a linear or cubic power-to-frequency relationship (given the throughput of the system. When we turn on a new Mean response time (sec) Mean response time (sec) Mean arrival rate (jobs/sec) ® Mean arrival rate (jobs/sec) ® Figure 4: Open loop experimental results for mean response time as a function of the arrival rate using DFSand DVFS for LINPACK. In Fig.(a) PowMax outperforms PowMin for all arrival rates under DFS, by asmuch as a factor of 5. By contrast in Fig.(b), for DVFS, at lower arrival rates, PowMax outperforms PowMinby up to 22%, while at higher arrival rates, PowMin outperforms PowMax by up to 14%. server at b units of power, the increase in throughput of the system is s b GHz. For low values of sb, this increase is small.
Hence, for low values of sb, we wish to turn on as few servers as possible. Thus, PowMax is optimal. However, when s is high, it pays to turn servers on. Once a server is on, theinitial steep increase in frequency per Watt afforded by a cubic power-to-frequency relationship advocates running theserver at more than the minimal power b. The exact optimal PowMed power value (b + x in the Theorem) is close to theknee of the cubic power-to-frequency curve.
5. EXPERIMENTAL RESULTS Mean response time (sec) In this section we test our theoretical results from Sec- tion 4 on an IBM BladeCenter using the experimental setup Mean arrival rate (jobs/sec) ® discussed in Section 2. We shall first present our experimen-tal results for the open server farm configuration and thenmove on to the closed server farm configuration. For the Figure 5: Open loop experimental results for mean experiments in this section, we use the Intel LINPACK [13] response time as a function of the arrival rate using workload, which is CPU bound. We defer experimental re- DVFS+DFS for LINPACK. At lower arrival rates, sults for other workloads to Section 6.
PowMax outperforms PowMed by up to 12%, while at As noted in Section 3, the baseline power level and the higher arrival rates, PowMed outperforms PowMax maximum power level for both DFS and DVFS are b = 180W by up to 20%. Note that PowMin is worse than both and c = 240W respectively. For DVFS+DFS, b = 150W PowMed and PowMax throughout the range of arrival and c = 250W. In each of our experiments, we try to fix the power budget, P, to be an integer multiple of b and c, as inEq. (3).
PowMin is huge; ranging from a factor of 3 at low arrival 5.1 Open server farm configuration rates (load, ρ ≈ 0.2) to as much as a factor of 5 at high Fig. 4(a) plots the mean response time as a function of arrival rates (load, ρ ≈ 0.7). This is because the power-to- the arrival rate for DFS with a power budget of P = 720W.
frequency relationship for DFS is steep (See Fig. 2(a)), hence In this case, PowMax (represented by the dashed line) de- running servers at maximum power levels affords a huge gain notes running 3 servers at c = 240W and turning off all in server frequency. Arrival rates higher than 0.22 jobs/sec other servers. PowMin (represented by the solid line) de- cause our systems to overload under PowMin because sb is notes running 4 servers at b = 180W and turning off all other very low for DFS. Hence, we only go as high as 0.22 jobs/sec.
servers. Clearly, PowMax outperforms PowMin throughout Fig. 4(b) plots the mean response time as a function of the the range of arrival rates. This is in agreement with the arrival rate for DVFS with a power budget of P = 720W.
predictions of Theorem 1. Note from Fig. 4(a) that the im- PowMax (represented by the dashed line) again denotes run- provement in mean response time afforded by PowMax over ning 3 servers at c = 240W and turning off all other servers.
Mean response time (sec) Mean response time (sec) Multi−programming level (N) ® Multi−programming level (N) ® Figure 6: Closed loop experimental results for mean response time as a function of number of jobs (N ) in thesystem using DFS and DVFS for LINPACK. In Fig.(a), for DFS, PowMax outperforms PowMin for all valuesof N , by almost a factor of 2 throughout. By contrast in Fig.(b), for DVFS, at lower values of N , PowMax isslightly better than PowMin, while at higher values of N , PowMin outperforms PowMax by almost 30%. PowMin (represented by the solid line) denotes running 4 servers at b = 180W and turning off all other servers. We see that when the arrival rate is low, PowMax produces lower mean response times than PowMin. In particular, when the arrival rate is 0.5 jobs/sec, PowMax affords a 22% im-provement in mean response time over PowMin. However, at higher arrival rates, PowMin outperforms PowMax, aspredicted by Theorem 1. In particular, when the arrivalrate is 1 job/sec, PowMin affords a 14% improvement inmean response time over PowMax. Under DVFS, we can afford arrival rates up to 1 job/sec before overloading thesystem. To summarize, under DVFS, we see that PowMin Mean response time (sec) can be preferable to PowMax. This is due to the flatness ofthe power-to-frequency curve for DVFS (See Fig. 2(a)), and agrees perfectly with Theorem 1.
Multi−programming level (N) ® Fig. 5 plots the mean response time as a function of the arrival rate for DVFS+DFS with a power budget of P =1000W. In this case, PowMax (represented by the dashed Figure 7: Closed loop experimental results for mean line) denotes running 4 servers at c = 250W and turning response time as a function of number of jobs (N ) off all other servers. PowMed (represented by the solid line) in the system using DVFS+DFS for LINPACK. At denotes running 5 servers at b+c = 200W and turning off lower values of N , PowMed is slightly better than all other servers. We see that when the arrival rate is low, PowMax, while at higher values of N , PowMed out- PowMax produces lower mean response times than PowMed.
performs PowMax, by as much as 40%. Note that However, at higher arrival rates, PowMed outperforms Pow- PowMin is worse than both PowMed and PowMax Max, exactly as predicted by Theorem 2. For the sake of for all values of N . completion, we also plot PowMin (dotted line in Fig. 5).
Note that PowMin is worse than both PowMed and Pow-Max throughout the range of arrival rates. Note that we PowMax (represented by the dashed line) denotes running use the value of b+c = 200W as the optimal power allocated 3 servers at c = 240W and turning off all other servers.
to each server in PowMed for our experiments as this value PowMin (represented by the solid line) denotes running 4 is close to the theoretical optimum predicted by Theorem 2 servers at b = 180W and turning off all other servers. Clearly, (which is around 192W for the range of arrival rates we use) PowMax outperforms PowMin throughout the range of N , and also helps to keep the power budget at 1000W .
by almost a factor of 2 throughout the range. This is inagreement with the predictions of Theorem 3.
5.2 Closed server farm configuration Fig. 6(b) plots the mean response time as a function of We now turn to our experimental results for closed server the multi-programming level for DVFS with a power bud- farm configurations. Fig. 6(a) plots the mean response time get of P = 720W. PowMax (represented by the dashed line) as a function of the multi-programming level (MPL = N ) again denotes running 3 servers at c = 240W and turning for DFS with a power budget of P = 720W. In this case, off all other servers. PowMin (represented by the solid line) Figure 8: Power-to-frequency curves for DFS, DVFS, and DVFS+DFS for the CPU bound DAXPY workload.
Fig.(a) illustrates our measurements for DFS and DVFS. In both these mechanisms, we see that the serverfrequency is linearly related to the power allocated to the server. Fig.(b) illustrates our measurements forDVFS+DFS, where the power-to-frequency curve is better approximated by a cubic relationship. Mean response time (sec) Mean response time (sec) Mean response time (sec) Mean arrival rate (jobs/sec) ® Mean arrival rate (jobs/sec) ® Mean arrival rate (jobs/sec) ® Figure 9: Open loop experimental results for mean response time as a function of the arrival rate using DFS,DVFS, and DVFS+DFS for the CPU bound DAXPY workload. In Fig.(a), for DFS, PowMax outperformsPowMin throughout the range of arrival rates by as much as a factor of 5. In Fig.(b), for DVFS, PowMaxoutperforms PowMin throughout the range of arrival rates by around 30%. In Fig.(c), for DVFS+DFS, PowMaxoutperforms both PowMed and PowMin throughout the range of arrival rates. While at lower arrival ratesPowMax only slightly outperforms PowMed, at higher arrival rates the improvement is around 60%. denotes running 4 servers at b = 180W and turning off all predictions of Theorem 5. In particular, when N = 100, other servers. We see that when N is high, PowMin pro- PowMed affords a 40% improvement in mean response time duces lower mean response times than PowMax. This is in over PowMax. However, when N is low, PowMed produces agreement with the predictions of Theorem 4. In particular, only slightly lower response times than PowMax. Note that when N = 100, PowMin affords a 30% improvement in mean throughout the range of N , PowMin is outperformed by both response time over PowMax. However, when N is low, Pow- PowMax and PowMed.
Max produces slightly lower response times than PowMin.
This is in agreement with Theorem 3.
Fig. 7 plots the mean response time as a function of the 6. OTHER WORKLOADS multi-programming level for DVFS+DFS with a power bud- Thus far we have presented experimental results for a CPU get of P = 1000W. In this case, PowMax (represented by bound workload LINPACK. In this section we present ex- the dashed line) denotes running 4 servers at c = 250W perimental results for other workloads. Our experimental and turning off all other servers. PowMed (represented by results agree with our theoretical predictions even in the the solid line) denotes running 5 servers at b+c = 200W case of non-CPU bound workloads.
and turning off all other servers. PowMin (represented by We fully discuss experimental results for two workloads, the dotted line) denotes running 6 servers at 170W. We DAXPY and STREAM, in this section and summarize our see that when N is high, PowMed produces lower mean re- results for other workloads at the end of the section. Due to sponse times than PowMax. This is in agreement with the lack of space, we only show results for open loop configura- neck for STREAM's performance is the memory subsystem since STREAM is memory bound. Thus, every extra Watt DAXPY [22] is a CPU bound workload which we have sized of power added to the system would mainly be used up by to be L1 cache resident. This means DAXPY uses a lot the memory subsystem and the improvement in processor of processor and L1 cache but rarely uses the server mem- frequency would be minimal.
ory and disk subsystems. Hence, the power-to-frequency Figs. 11(a), (b) and (c) present our power allocation re- relationship for DAXPY is similar to that of CPU bound sults for STREAM under DFS, DVFS, and DVFS+DFS re- LINPACK except that DAXPY's peak power consumption spectively. Due to the downwards concave nature of the tends to be lower than that of LINPACK, since DAXPY power-to-frequency curves for STREAM studied in Fig. 10, does not use a lot of memory or disk.
Theorem 2 says that PowMax should be optimal at low ar- Figs. 8(a) and (b) present our results for the power-to- rival rates and PowMed should be optimal at high arrival frequency relationship for DAXPY. The functional form of rates. However, for the values of α0 in Fig. 10, we find that the power-to-frequency relationship under DFS and DVFS the threshold point, λlow, below which PowMax is optimal, in Fig. 8(a) is clearly linear. However, the power-to-frequency is quite high. Hence, PowMax is optimal in Fig. 11(c). In relationship under DVFS+DFS in Fig. 8(b) is better approx- Figs. 11(a) and (b), PowMax and PowMed produce similar imated by a cubic relationship. These trends are similar to response times.
the power-to-frequency relationship for LINPACK seen in GZIP and BZIP2 are common software applications used for Figs. 9(a), (b) and (c) present our power allocation results data compression in Unix systems. These CPU bound com- for DAXPY under DFS, DVFS, and DVFS+DFS respec- pression applications use sophisticated algorithms to reduce tively. For DFS, in Fig. 9(a), PowMax outperforms PowMin the size of a given file. We use GZIP and BZIP2 to compress throughout the range of arrival rates, by as much as a fac- a file of uncompressed size 150 MB. For GZIP, we find that tor of 5. This is in agreement with Theorem 1. Note that PowMax is optimal for all of DFS, DVFS, and DVFS+DFS.
we use 165W as the power allocated to each server under These results are similar to the results for DAXPY. For PowMin to keep the power budget same for PowMin and BZIP2, the results are similar to those of LINPACK. In PowMax. For DVFS, in Fig. 9(b), PowMax outperforms particular, at low arrival rates, PowMax is optimal. For PowMin throughout the range of arrival rates, by around high arrival rates, PowMax is optimal for DFS, PowMin is 30%. This is in contrast to LINPACK, where PowMin out- optimal for DVFS and PowMed is optimal for DVFS+DFS.
performs PowMax at high arrival rates. The reason why PowMax outperforms PowMin for DAXPY is the lower value WebBench [15] is a benchmark program used to measure of sb = 2.2 GHz for DAXPY as compared to sb = 2.5 GHz web server performance by sending multiple file requests to for LINPACK. Since sb = 0.0137 < α = 0.014 for DAXPY a server. For WebBench, we find the power-to-frequency under DVFS, Theorem 1 rightly predicts PowMax to be op- relationship for DFS, DVFS, and DVFS+DFS to be cubic.
timal. Finally, in Fig. 9(c) for DVFS+DFS, PowMax out- This is similar to the power-to-frequency relationships ob- performs both PowMed and PowMin throughout the range served for STREAM since WebBench is more memory and of arrival rates. Again, this is in contrast to LINPACK, disk intensive. As theory predicts (See Theorem 2), we find where PowMed outperforms PowMax at high arrival rates.
PowMax to be optimal at low arrival rates and PowMed The reason why PowMax outperforms PowMed for DAXPY to be optimal at high arrival rates for DFS, DVFS, and is the higher value of α0 = 0.46 GHz/ 3 W for DAXPY as compared to α0 = 0.39 GHz/ 3 W for LINPACK. This isin agreement with the predictions of Theorem 2 for high values of α0. Intuitively, for a cubic power-to-frequency re- In this paper, we consider the problem of allocating an lationship, we have from Eq. (2): s = sb + α0 3 P − b. As available power budget among servers in a server farm to α0 increases, we get more server frequency for every Watt of minimize mean response time. The amount of power allo- power added to the server. Thus, at high α0, we allocate as cated to a server determines its speed in accordance to some much power as possible to every server, implying PowMax.
power-to-frequency relationship. Hence, we begin by mea- suring the power-to-frequency relationship within a single STREAM [18] is a memory bound workload which does not server. We experimentally find that the power-to-frequency use a lot of processor cycles. Hence, the power consumption relationship within a server for a given workload can be ei- at a given server frequency for STREAM is usually lower ther linear or cubic. Interestingly, we see that the relation- than CPU bound LINPACK and DAXPY.
ship is linear for DFS and DVFS when the workload is CPU Figs. 10(a) and (b) present our results for the power- bound, but cubic when it is more memory bound. By con- to-frequency relationship for STREAM. Surprisingly, the trast, the relationship for DVFS+DFS is always cubic in our functional form of the power-to-frequency relationship under DFS, DVFS, and DVFS+DFS is closer to a cubic relation- Given the power-to-frequency relationship, we can view ship than to a linear one. In particular, the gain in server the problem of finding the optimal power allocation in terms frequency per Watt at higher power allocations is much lower of determining the optimal frequencies of servers in the server than the gain in frequency per Watt at lower power alloca- farm to minimize mean response time. However, there are tions. We argue this observation as follows: At extremely several factors apart from the server frequencies that affect low server frequencies, the bottleneck for STREAM's perfor- the mean response time for a server farm. These include mance is the CPU. Thus, every extra Watt of power added the arrival rate, the maximum speed of a server, the total to the system would be used up by the CPU to improve its power budget, whether the server farm has an open or closed frequency. However, at higher server frequencies, the bottle- configuration, etc. To fully understand the effects of these Figure 10: Power-to-frequency curves for DFS, DVFS, and DVFS+DFS for the memory bound STREAMworkload. Fig.(a) illustrates our measurements for DFS and DVFS, while Fig.(b) illustrates our measurementsfor DVFS+DFS. In all the three mechanisms, the power-to-frequency curves are downwards concave, depictinga cubic relationship between power allocated to a server and its frequency. Mean response time (sec) Mean response time (sec) Mean response time (sec) Mean arrival rate (jobs/sec) ® Mean arrival rate (jobs/sec) ® Mean arrival rate (jobs/sec) ® Figure 11: Open loop experimental results for mean response time as a function of the arrival rate usingDFS, DVFS, and DVFS+DFS for the memory bound STREAM workload. In Figs.(a) and (b), for DFS andDVFS respectively, PowMed and PowMax produce similar response times. In Fig.(c) however, for DVFS+DFS,PowMax outperforms PowMed by as much as 30% at high arrival rates. In all three cases, PowMin is worsethan both PowMed and PowMax. factors on mean response time, we develop a queueing theo- tion accordingly. The theorems in this paper already tell us retic model (See Section 4.1) that allows us to predict mean the optimal power allocation for any given arrival rate. We response time as a function of the above factors. We then are now working on incorporating the effects of switching produce theorems (See Sections 4.2 and 4.3) that determine costs into our model.
the optimal power allocation for every possible configuration Second, while we have considered turning servers on or off, of the above factors.
today's technology [2, 6] allows for servers which are sleep- To verify our theoretical predictions, we conduct extensive ing (HALT state or deep C states). These sleeping servers experiments on an IBM BladeCenter for a range of work- consume less power than servers that are on, and can more loads using DFS, DVFS, and DVFS+DFS (See Section 5 quickly be moved into the on state than servers that are and 6). In every case, we find that the experimental results turned off. We are looking at ways to extend our theorems are in excellent agreement with our theoretical predictions.
to allow for servers with sleep states.
Third, while this paper deals with power management at the server level (measuring and allocating power to the 8. DISCUSSION AND FUTURE WORK server as a whole), our techniques can be extended to deal There are many extensions to this work that we are ex- with individual subsystems within a server, such as power ploring, but are beyond the scope of this paper.
allocation within the storage subsystem. We are looking First of all, the arrival rate into our server farm may vary at extending our implementation to individual components dynamically over time. In order to adjust to a dynamically within a server.
varying arrival rate, we may need to adjust the power alloca- [19] David Mosberger and Tai Jin. httperf—A Tool for [1] Lesswatts.org: Race to idle.
Measuring Web Server Performance. ACM Sigmetrics: Performance Evaluation Review, 26:31–37, 1998.
[20] Vivek Pandey, W. Jiang, Y. Zhou, and R. Bianchini.
[2] Intel: Nehalem.
DMA-Aware Memory Energy Management. HPCA '06: The 12th International Symposium on NGMS001/SF08 NGMS001 100t.pdf.
High-Performance Computer Architecture, pages [3] U.S. Environmental Protection Agency. Epa report on 133–144, 11-15 Feb. 2006.
server and data center energy efficiency. 2007.
[21] Ramya Raghavendra, Parthasarathy Ranganathan, [4] National Electrical Contractors Association. Data Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. No centers - meeting today's demand. 2007.
"Power" Struggles: Coordinated Multi-Level Power [5] Jeffrey S. Chase, Darrell C. Anderson, Prachi N.
Management for the Data Center. In ASPLOS XIII: Thakar, and Amin M. Vahdat. Managing energy and Proceedings of the 13th international conference on server resources in hosting centers. In In Proceedings Architectural support for programming languages and of the Eighteenth ACM Symposium on Operating operating systems, pages 48–59, 2008.
Systems Principles (SOSP), pages 103–116, 2001.
[22] K. Rajamani, H. Hanson, J. C. Rubio, S. Ghiasi, and [6] Intel Corp. Intel Core2 Duo Mobile Processor F. L. Rawson. Online power and performance Datasheet: Table 20.
estimation for dynamic power management. Research Report RC-24007, July 2006.
32012001.pdf, 2008.
[23] Salvatore Sanfilippo. WBox HTTP testing tool [7] M. Elnozahy, M. Kistler, and R. Rajamony. Energy (Version 4). http://www.hping.org/wbox/, 2007.
conservation policies for web servers. In USITS, 2003.
[24] X Wang and M Chen. Cluster-level Feedback Power [8] Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Control for Performance Optimization. 14th IEEE Barroso. Power provisioning for a warehouse-sized International Symposium on High-Performance computer. pages 13–23, 2007.
Computer Architecture (HPCA 2008), February 2008.
[9] Wes Felter, Karthick Rajamani, Tom Keller, and [25] Zhikui Wang, Xiaoyun Zhu, Cliff McCarthy, Partha Cosmin Rusu. A performance-conserving approach for Ranganathan, and Vanish Talwar. Feedback Control reducing peak power consumption in server systems.
Algorithms for Power Management of Servers. In In ICS '05: Proceedings of the 19th annual Third International Workshop on Feedback Control International Conference on Supercomputing, pages Implementation and Design in Computing Systems 293–302, New York, NY, USA, 2005. ACM.
and Networks (FeBid), Annapolis, MD, June 2008.
[10] Mark E. Femal and Vincent W. Freeh. Boosting Data Center Performance Through Non-Uniform Power Allocation. In ICAC '05: Proceedings of the SecondInternational Conference on Automatic Computing, A. PROOFS OF OPEN LOOP CONFIGURA- pages 250–261, Washington, DC, 2005.
TION THEOREMS (SECTION 4.2) [11] M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, In this appendix we provide brief sketches of the proofs of F. L. Rawson, J. C. Rubio, and M. S. Ware. System each theorem from the paper. All theorems, again, assume Power Management Support in the IBM POWER6 Microprocessor. IBM Journal of Research andDevelopment, 51:733–746, 2007.
Theorem 1. Given an open k-server farm configuration [12] Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, with a linear power-to-frequency relationship and power bud- and Charles Lefurgy. Optimal power allocation in get P, the following power allocation setting minimizes E[T ]: server farms. Technical Report CMU-CS-09-113, 2009.
If sb ≤ α: 1,2,.,n = c, P ∗ n+1,n+2,.,k = 0 [13] Intel Corp. Intel Math Kernel Library 10.0 - If sb > α: 1,2,.,n = c, P ∗ n+1,n+2,.,k = 0 if λ ≤ λlow P∗1,2,.,m = b, P∗m+1,m+2,.,k = 0 if λ > λlow where λlow = α · P. Proof. We provide a complete proof for the case of k = 2 [14] Raj Jain. The Art of Computer Systems Performance in Lemma 1. We then use this result to prove the theorem Analysis: techniques for experimental design, for the case of arbitrary k by contradiction.
measurement, simulation, and modeling. pages563–567. Wiley, 1991.
Lemma 1. Given an open 2-server farm configuration with [15] Radim Kolar. Web bench.
linear power-to-speed relationship and power budget P, the following power allocation minimizes E[T]: [16] Kleinrock L. Queueing Systems, Volume 2.
If sb ≤ α: 1 = min {c, P } , P2 = P − P ∗ Wiley-Interscience, New York, 1976.
 P∗1 = min {c, P} , P∗2 = P − P∗1 if λ ≤ λlow [17] Charles Lefurgy, Xiaorui Wang, and Malcolm Ware.
If sb > α: 1 = b, P ∗ 2 = P − b if 2b ≤ P ≤ b + c & λ > λlow Power capping: a prelude to power shifting. Cluster P∗1 = min {c, P} P∗2 = min {c, P − P∗1} otherwise Computing, November 2007.
where λlow = sc+α (c − b)(c + b − P)− sb(sb + α(P − 2b)). [18] J.D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers.
http://www.cs.virginia.edu/stream/.
Proof. [Lemma 1] The proof is trivial for the cases P < 2b and P ≥ 2c. Thus, assume 2b ≤ P < 2c. Let P be split two servers, say servers i and j, such that b ≤ Pi < c and among the two servers as (P1, P2).
b ≤ Pj < c. We invoke Lemma 1 on servers i and j with The goal is to find the optimal power allocation that min- total power P0 = Pi +Pj. Lemma 1 tells us that the optimal imizes E[T ] given the constraint: power allocation for servers i and j is P∗i = min {c, P0} and j = P 0 − P ∗ i . It can also be shown (see [12]) that reducing the mean response time of the 2-server system consisting Assuming a linear power-to-frequency relationship, we have of servers i and j reduces the mean response time of the the following relationship for the speed of server i, s whole system. Thus, we have shown that the mean response function of the power allocated to it, P time of the system under π can reduced. Thus, PowMax is si = sb + α(Pi − b), The case of sb > α is more complex. We want to show where α, s that the servers that are turned on should either all run at b and b are as defined in Sections 2.3 and 3.
We now wish to derive E[T ] for a 2-server farm with speeds power c or all run at power b, depending on λ. We start with an arbitrary power allocation and show that we can 1 and s2. Recall from Section 4.1 that we have a PS schedul- ing discipline and a Poisson arrival process with some mean repeatedly apply Lemma 1 to two servers at a time to end λ. It is well-known [16] that for a single M/G/1/PS queue up in a power allocation where we have some servers at with speed s, Poisson arrival rate λ and general job size, power b, some servers at power c and at most one server at E[T ] is as follows: some intermediate power. At this point, we optimize overall power allocation settings of the above type, and find that E[T ]M/G/1/P S E[T ] is minimized in the particular power allocation where either n servers are at power c (for low λ) or m servers are By exploiting Poisson splitting in our model, we have that at power b (for high λ). For details, see [12].
the mean response time of jobs in the server farm with split- Theorem 2. Given an open k-server farm configuration ting parameter q (where q is the fraction of jobs sent to the with a cubic power-to-frequency relationship (as in Eq. (2)) 1st server) and server speeds s1 and s2 is: and power budget P, the following power allocation setting minimizes E[T ]: E[T ] = s2 − λ · (1 − q) 1,2,.,n = c, P∗n+1,n+2,.,k = 0 if λ ≤ λ0low . given the stability constraints: 1,2,.,l = P l+1,l+2,.,k = 0 if λ > λ0low c − b − 3 − b and l = 1 > λq s2 > λ(1 − q) low = nlα0 b+ α0P From Eq. (6) we see that E[T ] is a function of s1, s2 and q.
However, using Eqs. (4) and (5), we can express s Proof. We follow the same process as in the proof of Thm. 1. Staring with the case of two servers, we note that Eq. (9) is still valid in the case of a cubic power-to-frequency s2 = 2sb + α(P − 2b) − s1 relationship. However, Eq. (8) now takes the form: We now derive the optimal value of q, which we call q∗. To s2 = sb + 3 α03(P − 2b) − (s1 − sb)3 do this, we fix the speeds s1 and s2 in Eq. (6) and set thederivative of E[T ] w.r.t. q to be 0. This yields: where α0, sb and b are as defined in Section 2.3. We can now express E[T ] (from Eq. (10)) in terms of just s1 by substituting for q and s2 from Eqs. (9) and (11). Hence λ( s1 + s2) we can differentiate E[T ] w.r.t. s1 to yield the optimal s1, Using q∗ in the expression for E[T ] from Eq. (6) gives us: which translates to P∗1. We find that, for the 2-server case, PowMax is optimal at low arrival rates, whereas PowMed λ + 2 s is optimal at high arrival rates. Note that PowMed for two E[T ] = servers refers to P∗ if P ≥ 2b and 1 + s2 − λ) = P∗2 = min P∗1 = P, P∗2 = 0 otherwise.
At this point, we can express E[T ] in terms of just s1 by We now return to the proof of Thm. 2. When there are substituting for s2 from Eq. (8). Hence we can differentiate k > 2 servers, we want to show that we should either run n E[T ] w.r.t. s1 to yield the optimal s1, which translates via servers at power c or run l servers at power P , depending Eq. (5) to P∗ 1 . We find that for sb ≤ α, PowMax minimizes on λ. We start with an arbitrary power allocation and show E[T ], where PowMax for 2 servers refers to P∗1 = min {c, P} that we can repeatedly apply the two server result discussed and P∗2 = P − P1. For sb > α, we find that PowMax above to pairs of servers at a time to end up in a power minimizes E[T ] for λ ≤ λlow, whereas PowMin minimizes allocation where we have some v servers at power c and some E[T ] for λ > λlow. PowMin for 2 servers refers to P∗1 = w servers at power P−vc . At this point, we optimize over 2 = P − b if 2b ≤ P ≤ b + c and P ∗ 1 = min {c, P } , P ∗ all power allocation settings of the above type, and find that min {c, P − P∗1} otherwise.
E[T ] is minimized in the particular power allocation whereeither n servers are at power c (for low λ) or l servers are at We now return to the proof of Thm. 1. We first consider power P (for high λ). For details, see [12].
the case sb ≤ α. The proof proceeds by contradiction. We claim that PowMax is optimal. Thus, assume we have apower allocation π = (P B. PROOFS OF CLOSED LOOP CONFIG- 1, . . , Pk), which is not the same as PowMax. Since π is not PowMax, there must exist at least URATION THEOREMS (SECTION 4.3) where α, sb and b are as defined in Section 2.3. Thus we For closed loop configurations, [14] provides well-known asymptotic bounds for the mean response time both in the case where N, the number of jobs in the system, is very high (sb + α(Pi − b)) and in the case where N is very low. These bounds provide excellent approximations for E[T ] . Here, N is considered high if it significantly exceeds the number of servers that are = j(sb − αb) + α on, and N is considered low if it is close to 1. We will use these approximations from [14] to derive the optimal power = j(s allocations in Theorem 3 (low N) and in Theorems 4 and 5 b − αb) + αP Hence, if sb ≥ α, E[T ] is minimized by maximizing j. Thus PowMin is optimal for sb ≥ α. When sb < α, E[T ] is Theorem 3. Given a closed k-server farm configuration minimized by minimizing j. Thus PowMax is optimal for with a linear or cubic power-to-frequency relationship, the sb < α.
following power allocation minimizes E[T ] for low N , based on the asymptotic approximations in [14]: Theorem 5. Given a closed k-server farm configuration with a cubic power-to-frequency relationship, the following P∗1,2,.,n = c, P∗n+1,n+2,.,k = 0 power allocation minimizes E[T ] for high N , based on theasymptotic approximations in [14]: Proof. We start by assuming the number of jobs in the If sb < s0: system, N , is low. From [14] we have that: 1,2,.,n = c, P∗n+1,n+2,.,k = 0 If sb ≥ s0: P∗1,2,.,l = b + x, P∗l+1,l+2,.,k = 0 , s0 = msc − α0 3 x and x is the non- E[T ] ≈ negative solution of the equation b = 2x + 1 α0 (3x Without loss of generality, assume s1 ≥ s2 ≥ . . ≥ sk.
Proof. As in the proof for Theorem 4, minimizing E[T ] This implies 1 ≤ 1 ≤ . . ≤ 1 . Thus, E[T ] given by is equivalent to maximizing i, where j ≤ k is the Eq. (12) is minimized by setting q1 = 1 if s1 > s2, or q1 = number of servers that are turned on. For a cubic power-to- . . = qa = 1 if s frequency relationship given by Eq. (2), we have: 1 = . . = sa for some integer a. If s1 > s2, we want to minimize E[T ] = 1 . Thus PowMax is optimal in this case. For the case of s i = sb + α0 3 1 = . . = sa for some integer a, we want to minimize E[T ] = 1 , for i = 1, 2, . . , a. This where α0, s b and b are as defined in Section 2.3. Thus we is achieved by maximizing s i. Clearly, by setting a = P , s is maximized by P∗i = c for i = 1, 2, . . , a. Hence, for low N , PowMax is optimal.
sb + α0 3 Pi − b Theorem 4. Given a closed k-server farm configuration with a linear power-to-frequency relationship , the following b + α0 power allocation minimizes E[T ] for high N , based on the asymptotic approximations in [14]: If sb < α: Since the sum of Pi's is a constant, the sum of their cube 1,2,.,n = c, P∗n+1,n+2,.,k = 0 If sb ≥ α: roots attains its maximum value when all the Pi's are equal.
1,2,.,m = b, P∗m+1,m+2,.,k = 0 This follows from the generalized mean inequality. Thus, Proof. Assuming that N is high, from [14] we have that: we wish to maximize j sb + α0 3 − b . By looking at E[T ] ≈ N · max 1 , 2 , . . , k the derivative of si, we find the power allocation that maximizes the sum of speeds, and thus minimizes the mean Without loss of generality, assume q1 ≥ q2 ≥ . . ≥ qj , response time. We find that PowMax is optimal for low values of s where j ≤ k denotes the number of servers that are turned b whereas PowMed is optimal for high values of on. Thus, E[T ] = N · q1 from Eq. (13). Since N is a b. Hence the result.
constant, minimizing E[T ] is now equivalent to minimizing q1 , which attains the lowest value of q2 when q1 = q2 .
However q2 itself attains the lowest value of q3 when q2 = q3 and so on. Thus, to minimize E[T ], we must set q1 = q2 = . . = qj = r, say. Since i = 1, we have: E[T ] = N · 1 = N · r = P Thus, minimizing E[T ] is equivalent to maximizing For a linear power-to-frequency relationship given by Eq. (5),we have: si = sb + α(Pi − b),

Source: http://www.cs.zju.edu.cn/people/yedeshi/seminar09/Sigmetrics09power.pdf

Exogenous plant mir168a specifically targets mammalian ldlrap1: evidence of cross-kingdom regulation by microrna

Cell Research (2012) 22:107-126. © 2012 IBCB, SIBS, CAS All rights reserved 1001-0602/12 $ 32.00 Exogenous plant MIR168a specifically targets mammalian LDLRAP1: evidence of cross-kingdom regulation by Lin Zhang1, *, Dongxia Hou1, *, Xi Chen1, *, Donghai Li1, *, Lingyun Zhu1, 2, Yujing Zhang1, Jing Li1, Zhen Bian1, Xiangying Liang1, Xing Cai1, Yuan Yin1, Cheng Wang1, Tianfu Zhang1, Dihan Zhu1, Dianmu Zhang1, Jie Xu1,

Doi:10.1016/j.physa.2004.12.028

Physica A 352 (2005) 113–130 A cell-centered approach to developmental biology Roeland M.H. Merks, James A. Glazier Department of Physics, Biocomplexity Institute, Indiana University, Swain Hall West 159, 727 East 3rd Street, Bloomington, IN 47405-7105, USA Available online 13 January 2005 Explaining embryonic development of multicellular organisms requires insight into complex