本文部分内容参考
万字长文 | Thermal框架源码剖析,
Linux Thermal机制源码分析之框架概述_不捡风筝的玖伍贰柒的博客-CSDN博客,
“热散由心静,凉生为室空” - linux温控的那些事儿_内核工匠的博客-CSDN博客,
Linux thermal governor之IPA分析_内核工匠的博客-CSDN博客
特此致谢!
接前一篇文章Linux内核Thermal框架详解十三、Thermal Governor(3)
二、具体温控策略
上一篇文章介绍并详细分析了fair_share governor的源码。本文介绍第3种温控策略:step_wise。
3. step_wise
step_wise是CPU温控常用的governor,其在温控管理上有着良好的表现。在计算target cooling state的过程中,step_wise除了根据配置的温度触发点(TripPoint)获知是否throttle外,还添加了一个参考项:trend,即温升趋势。Linux Thermal框架定义了三种trend type:上升(RAISING)、下降(DROPPING)、稳定(STABLE)。step_wise将温升趋势(trend)分为三种:上升、下降、稳定。step_wise根据配置的温度触发点及温度趋势以步进的方式调节CPU的状态,具体来讲就是调节频率。比如:温度已经超过触发点同时温度处于上升状态,则step-wise就会将CPU的coolingstate上调一级(对应操作是频率降低),然后继续轮询CPU的温度,通过判断温度趋势再执行相应动作。
一句话概括,step_wise governor根据cur_state、温升趋势trend、是否throttle去计算cooling_device的target_state,从而达到控制cooling_device来控制温升。
step_wise governor的代码在drivers/thermal/gov_step_wise.c中。它比bang_bang和fair_share复杂了不少,代码一共有200行出头。由于代码较长,在此就不全部列出了,而是直接分段贴出并进行分析。
(1)THERMAL_GOVERNOR_DECLARE相关代码
先来看THERMAL_GOVERNOR_DECLARE。它是一个宏定义,在drivers/thermal/thermal_core.h中,代码如下:
/* Init section thermal table */
extern struct thermal_governor *__governor_thermal_table[];
extern struct thermal_governor *__governor_thermal_table_end[];
#define THERMAL_TABLE_ENTRY(table, name) \
static typeof(name) *__thermal_table_entry_##name \
__used __section("__" #table "_thermal_table") = &name
#define THERMAL_GOVERNOR_DECLARE(name) THERMAL_TABLE_ENTRY(governor, name)
实际上这段代码在前文Linux内核Thermal框架详解四、Thermal Core(3)中已经进行了详细分析,这里就不再赘述了。不过为了便于理解和加深印象,将step_wise governor展开后的代码再次列出:
static struct thermal_governor thermal_gov_step_wise = {
.name = "step_wise",
.throttle = step_wise_throttle,
};
static struct thermal_governor *__thermal_table_entry_thermal_gov_step_wise \
__used __section("__governor_thermal_table") = &thermal_gov_step_wise
Thermal Governor都是通过THERMAL_GOVERNOR_DECLARE定义到了__governor_thermal_table这段空间内。然后在thermal core初始化时通过调用thermal_register_governors来注册到thermal_governor_list链表中。再之后通过经由“thermal_init->thermal_register_governors-> thermal_set_governor”路径和thermal zone device关联上。
(2)handle_non_critical_trips
struct thermal_governor中有一个成员throttle,其是一个函数指针:
int (*throttle)(struct thermal_zone_device *tz, int trip);
对于对象thermal_gov_step_wise来说,指向了step_wise_throttle函数。在解析step_wise_throttle函数之前,有一个问题必须弄清楚:这个函数是何时被调用的?
是在drivers/thermal/thermal_core.c的handle_non_critical_trips函数中,代码如下:
static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip)
{
tz->governor ? tz->governor->throttle(tz, trip) :
def_governor->throttle(tz, trip);
}
那么又是哪里调用的handle_non_critical_trips?是在drivers/thermal/thermal_core.c的handle_thermal_trip函数中,代码如下:
static void handle_thermal_trip(struct thermal_zone_device *tz, int trip)
{
enum thermal_trip_type type;
int trip_temp, hyst = 0;
/* Ignore disabled trip points */
if (test_bit(trip, &tz->trips_disabled))
return;
tz->ops->get_trip_temp(tz, trip, &trip_temp);
tz->ops->get_trip_type(tz, trip, &type);
if (tz->ops->get_trip_hyst)
tz->ops->get_trip_hyst(tz, trip, &hyst);
if (tz->last_temperature != THERMAL_TEMP_INVALID) {
if (tz->last_temperature < trip_temp &&
tz->temperature >= trip_temp)
thermal_notify_tz_trip_up(tz->id, trip,
tz->temperature);
if (tz->last_temperature >= trip_temp &&
tz->temperature < (trip_temp - hyst))
thermal_notify_tz_trip_down(tz->id, trip,
tz->temperature);
}
if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
handle_critical_trips(tz, trip, type);
else
handle_non_critical_trips(tz, trip);
/*
* Alright, we handled this trip successfully.
* So, start monitoring again.
*/
monitor_thermal_zone(tz);
}
对于handle_thermal_trip函数的详细分析有专门的文章章节,由于本篇文章专注于step_wise governor,故在此不深入展开。
(3)step_wise_throttle
step_wise_throttle函数无疑是step_wise governor的核心。代码如下:
/**
* step_wise_throttle - throttles devices associated with the given zone
* @tz: thermal_zone_device
* @trip: trip point index
*
* Throttling Logic: This uses the trend of the thermal zone to throttle.
* If the thermal zone is 'heating up' this throttles all the cooling
* devices associated with the zone and its particular trip point, by one
* step. If the zone is 'cooling down' it brings back the performance of
* the devices by one step.
*/
static int step_wise_throttle(struct thermal_zone_device *tz, int trip)
{
struct thermal_instance *instance;
thermal_zone_trip_update(tz, trip);
mutex_lock(&tz->lock);
list_for_each_entry(instance, &tz->thermal_instances, tz_node)
thermal_cdev_update(instance->cdev);
mutex_unlock(&tz->lock);
return 0;
}
函数注释已经将函数功能说得很清楚了:对与给定thermal zone关联的设备进行节流。调节逻辑如下:
本策略使用thermal zone的trend(温升趋势)。
如果thermal zone正在“升温”,则会对与thermal zone及其特定trip point相关的所有冷却设备进行节流,一步到位;
如果thermal zone正在“降温”,则将会一步到位地恢复设备的性能。
老外写东西比较拗口,没有文字功底。用更通俗的语言来解释step_wise governor对于cooling_state的计算(选择)策略:
当throttle发生且温升趋势为上升,使用更高一级的cooling state;当throttle发生且温升趋势为下降,不改变cooling state;当throttle未发生且温升趋势为上升,不改变cooling state;当throttle未发生(解除)且温升趋势为下降,使用更低一级的cooling state。
再更细化一下:
当温升趋势为上升,分为两种情况:
若发生throttle,则使用更高一级的cooling state;若未发生throttle,则不改变cooling state;当温升趋势为下降,分为两种情况:
若发生throttle,则不改变cooling state;若未发生(解除)throttle,则使用更低一级的cooling state。当达到最高温线且发生throttle,使用最高级的cooling state;当达到最低温线且发生throttle,使用最低级的cooling state。
注意: cooling state的取值范围在[instance->lower, instance->upper]。若cur_state < instance->lower,target_state则取值为THERMAL_NO_TARGET。
由此看来,step_wise governor是每个轮询周期逐级提高冷却状态,是一种相对温和的温控策略。
step_wise governor流程图如下所示:
(4)thermal_zone_trip_update
step_wise_throttle函数中调用了thermal_zone_trip_update函数,这个函数就在step_wise_throttle函数的上边,代码如下:
static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip)
{
int trip_temp;
enum thermal_trip_type trip_type;
enum thermal_trend trend;
struct thermal_instance *instance;
bool throttle = false;
int old_target;
tz->ops->get_trip_temp(tz, trip, &trip_temp);
tz->ops->get_trip_type(tz, trip, &trip_type);
trend = get_tz_trend(tz, trip);
if (tz->temperature >= trip_temp) {
throttle = true;
trace_thermal_zone_trip(tz, trip, trip_type);
}
dev_dbg(&tz->device, "Trip%d[type=%d,temp=%d]:trend=%d,throttle=%d\n",
trip, trip_type, trip_temp, trend, throttle);
mutex_lock(&tz->lock);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
if (instance->trip != trip)
continue;
old_target = instance->target;
instance->target = get_target_state(instance, trend, throttle);
dev_dbg(&instance->cdev->device, "old_target=%d, target=%d\n",
old_target, (int)instance->target);
if (instance->initialized && old_target == instance->target)
continue;
/* Activate a passive thermal instance */
if (old_target == THERMAL_NO_TARGET &&
instance->target != THERMAL_NO_TARGET)
update_passive_instance(tz, trip_type, 1);
/* Deactivate a passive thermal instance */
else if (old_target != THERMAL_NO_TARGET &&
instance->target == THERMAL_NO_TARGET)
update_passive_instance(tz, trip_type, -1);
instance->initialized = true;
mutex_lock(&instance->cdev->lock);
instance->cdev->updated = false; /* cdev needs update */
mutex_unlock(&instance->cdev->lock);
}
mutex_unlock(&tz->lock);
}
thermal_zone_trip_update函数做了以下几件事情:
1)获取触发温度
通过tz->ops->get_trip_temp(tz, trip, &trip_temp)获取所属thermal zone的trip_temp。
2)获取触发类型
通过tz->ops->get_trip_type(tz, trip, &trip_type)获取所属thermal zone的trip_type。
3)获取温升趋势
通过get_tz_trend(tz, trip)获取所属thermal zone的温升趋势(trend)。get_tz_trend函数在drivers/thermal/thermal_helpers.c中,代码如下:
int get_tz_trend(struct thermal_zone_device *tz, int trip)
{
enum thermal_trend trend;
if (tz->emul_temperature || !tz->ops->get_trend ||
tz->ops->get_trend(tz, trip, &trend)) {
if (tz->temperature > tz->last_temperature)
trend = THERMAL_TREND_RAISING;
else if (tz->temperature < tz->last_temperature)
trend = THERMAL_TREND_DROPPING;
else
trend = THERMAL_TREND_STABLE;
}
return trend;
}
EXPORT_SYMBOL(get_tz_trend);
tz->ops->get_trend实际指向了drivers/acpi/thermal.c中的thermal_get_trend函数。赋值函数指针的动作也是在此文件中完成的,代码如下:
static struct thermal_zone_device_ops acpi_thermal_zone_ops = {
.bind = acpi_thermal_bind_cooling_device,
.unbind = acpi_thermal_unbind_cooling_device,
.get_temp = thermal_get_temp,
.get_trip_type = thermal_get_trip_type,
.get_trip_temp = thermal_get_trip_temp,
.get_crit_temp = thermal_get_crit_temp,
.get_trend = thermal_get_trend,
.hot = acpi_thermal_zone_device_hot,
.critical = acpi_thermal_zone_device_critical,
};
thermal_get_trend函数的代码如下:
static int thermal_get_trend(struct thermal_zone_device *thermal,
int trip, enum thermal_trend *trend)
{
struct acpi_thermal *tz = thermal->devdata;
enum thermal_trip_type type;
int i;
if (thermal_get_trip_type(thermal, trip, &type))
return -EINVAL;
if (type == THERMAL_TRIP_ACTIVE) {
int trip_temp;
int temp = deci_kelvin_to_millicelsius_with_offset(
tz->temperature, tz->kelvin_offset);
if (thermal_get_trip_temp(thermal, trip, &trip_temp))
return -EINVAL;
if (temp > trip_temp) {
*trend = THERMAL_TREND_RAISING;
return 0;
} else {
/* Fall back on default trend */
return -EINVAL;
}
}
/*
* tz->temperature has already been updated by generic thermal layer,
* before this callback being invoked
*/
i = (tz->trips.passive.tc1 * (tz->temperature - tz->last_temperature))
+ (tz->trips.passive.tc2
* (tz->temperature - tz->trips.passive.temperature));
if (i > 0)
*trend = THERMAL_TREND_RAISING;
else if (i < 0)
*trend = THERMAL_TREND_DROPPING;
else
*trend = THERMAL_TREND_STABLE;
return 0;
}
4)获取(计算)每个instance目标状态
依次通过get_target_state(instance, trend, throttle)获取所属thermal zone的每个instance的目标状态(target state)。get_target_state函数在同文件(drivers/thermal/gov_step_wise.c)中,代码如下:
/*
* If the temperature is higher than a trip point,
* a. if the trend is THERMAL_TREND_RAISING, use higher cooling
* state for this trip point
* b. if the trend is THERMAL_TREND_DROPPING, do nothing
* c. if the trend is THERMAL_TREND_RAISE_FULL, use upper limit
* for this trip point
* d. if the trend is THERMAL_TREND_DROP_FULL, use lower limit
* for this trip point
* If the temperature is lower than a trip point,
* a. if the trend is THERMAL_TREND_RAISING, do nothing
* b. if the trend is THERMAL_TREND_DROPPING, use lower cooling
* state for this trip point, if the cooling state already
* equals lower limit, deactivate the thermal instance
* c. if the trend is THERMAL_TREND_RAISE_FULL, do nothing
* d. if the trend is THERMAL_TREND_DROP_FULL, use lower limit,
* if the cooling state already equals lower limit,
* deactivate the thermal instance
*/
static unsigned long get_target_state(struct thermal_instance *instance,
enum thermal_trend trend, bool throttle)
{
struct thermal_cooling_device *cdev = instance->cdev;
unsigned long cur_state;
unsigned long next_target;
/*
* We keep this instance the way it is by default.
* Otherwise, we use the current state of the
* cdev in use to determine the next_target.
*/
cdev->ops->get_cur_state(cdev, &cur_state);
next_target = instance->target;
dev_dbg(&cdev->device, "cur_state=%ld\n", cur_state);
if (!instance->initialized) {
if (throttle) {
next_target = (cur_state + 1) >= instance->upper ?
instance->upper :
((cur_state + 1) < instance->lower ?
instance->lower : (cur_state + 1));
} else {
next_target = THERMAL_NO_TARGET;
}
return next_target;
}
switch (trend) {
case THERMAL_TREND_RAISING:
if (throttle) {
next_target = cur_state < instance->upper ?
(cur_state + 1) : instance->upper;
if (next_target < instance->lower)
next_target = instance->lower;
}
break;
case THERMAL_TREND_RAISE_FULL:
if (throttle)
next_target = instance->upper;
break;
case THERMAL_TREND_DROPPING:
if (cur_state <= instance->lower) {
if (!throttle)
next_target = THERMAL_NO_TARGET;
} else {
if (!throttle) {
next_target = cur_state - 1;
if (next_target > instance->upper)
next_target = instance->upper;
}
}
break;
case THERMAL_TREND_DROP_FULL:
if (cur_state == instance->lower) {
if (!throttle)
next_target = THERMAL_NO_TARGET;
} else
next_target = instance->lower;
break;
default:
break;
}
return next_target;
}
最后给出step_wise governor代码总的流程图:
至此,step_wise governor策略就大体上分析完了。