Fixing kernel error AMD-Vi: Event logged IO_PAGE_FAULT on Ryzen Machine

Ubuntu
スポンサーリンク
Oct 29 07:40:01 mypc-PC anacron[3153]: Job `cron.weekly' started
Oct 29 07:40:01 mypc-PC anacron[2414]: Updated timestamp for job `cron.weekly' to 2019-10-29
Oct 29 07:40:01 mypc-PC kernel: [1207844.240253] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000000 flags=0x0000]
Oct 29 07:40:01 mypc-PC kernel: [1207844.240260] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000400 flags=0x0000]
Oct 29 07:40:01 mypc-PC kernel: [1207844.240347] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000000 flags=0x0000]
Oct 29 07:40:01 mypc-PC kernel: [1207844.240349] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000200 flags=0x0000]

My pc was periodically shut down on 7:40 am JST. I tryied to figure out why.

I checked the log in /var/log/syslog. There was a lot of the same errors shown as preceding text.

It seems to be caused by fstrim command triggered by anacron.weekly job. That’s why it happens in a certain period. When I manually executed fstrim --all, the errors was raised too.

Then, I found the bug repport in the bugzilla homepage.

202665 – NVMe AMD-Vi IO_PAGE_FAULT only with hardware IOMMU and fstrim/discard

That said it happened because of kernel bugs caused by a machine with Ryzen CPU.

My Environment

Ubuntu 16.04 x64
kernel 4.4.0-166.195
CPU : AMD Ryzen 7 2700X Eight-Core Processor
Motherboard : X470 GAMING PLUS (MS-7B79)
m.2 SSD: Sillicon Powwer P34A80

Easy solution

Easiest solution is put soft option to iommu in the kernel setting such as putting iommu=soft in Grub default. Still I don’t know much about how software iommu works, but it seems to make a delay to map files because it uses software rendering. I rather would like to use hardware rendering.

Fix the Kernel bug and build custom kernel.

That was really painful but I finanlly made it work. All I need is to apply the diff file like below to the kernel. The fix was mentioned in the bug repport page.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6cbde30..a8bd71c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -470,7 +470,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
    struct nvme_dsm_range *range;
    struct bio *bio;

-   range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
+   range = kmalloc_array(256, sizeof(*range), GFP_ATOMIC);
    if (!range)
        return BLK_STS_RESOURCE;

However, if you are using kernel 4.4, there is no kmalloc_array in the core.c file. So you need to upgrade your kernel if you are using Ubuntu 16.04 or find your own way.

sudo apt install linux-generic-hwe-16.04

After executing preceding command, you can use kernel 4.15.0, which is used by Ubuntu 18.04.

I made a docker image to build the kernel 4.15.0 for ubuntu 16.04. Make sure the branch is ubuntu16.04-kernel4.15.0.

Docker Image

GitHub – fx-kirin/docker-ubuntu-kernel-build at ubuntu16.04-kernel4.15.0

I also tried to build kernel 4.15 for Ubuntu 18.04. The branch name is ubuntu18.04. The patch file there is not ready to apply the fix, but it can build the kernel. So if you want to use it, please carefully check the way to apply patch in ubuntu16.04-kernel4.15.0.

Docker command

docker build -t kernel-build-16.04-4.15.0 .
docker run -it --rm -v ~/kbuild/ubuntu16.04/kernel4.15.0:/data -v ~/linux-patches/ubuntu16.04/kernel4.15.0/:/patches -e KERNEL_MAJOR=4.15.0 -e BUILD_CLEAN=Yes kernel-build-16.04-4.15.0

You put the patch file fix-nvme_18_04.patch to ~/linux-patches/ubuntu16.04/kernel4.15.0/, then you can build the patched kernel. Then copy all .deb files to your PC and install them with sudo dpkg -i *.deb command. If it fails, you should to try execute sudo apt intall -f command and try it again.

Result

After this patch, fstrim command works perfectly. Congrats.

Note

The interesting thing is that , if you want to use custom version name, you have to change the version name in the file changelog normally in debian.master directory (but this time, it is debian.hwe because I am using HWE Ubuntu).

And custom version name must not contain hyphen -. see here

コメント

タイトルとURLをコピーしました