Oct 29 07:40:01 mypc-PC anacron: Job `cron.weekly' started Oct 29 07:40:01 mypc-PC anacron: Updated timestamp for job `cron.weekly' to 2019-10-29 Oct 29 07:40:01 mypc-PC kernel: [1207844.240253] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000000 flags=0x0000] Oct 29 07:40:01 mypc-PC kernel: [1207844.240260] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000400 flags=0x0000] Oct 29 07:40:01 mypc-PC kernel: [1207844.240347] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000000 flags=0x0000] Oct 29 07:40:01 mypc-PC kernel: [1207844.240349] AMD-Vi: Event logged [IO_PAGE_FAULT device=1b:00.0 domain=0x000d address=0x0000000000000200 flags=0x0000]
My pc was periodically shut down on 7:40 am JST. I tryied to figure out why.
I checked the log in
/var/log/syslog. There was a lot of the same errors shown as preceding text.
It seems to be caused by
fstrim command triggered by
anacron.weekly job. That’s why it happens in a certain period. When I manually executed
fstrim --all, the errors was raised too.
Then, I found the bug repport in the bugzilla homepage.
202665 – NVMe AMD-Vi IO_PAGE_FAULT only with hardware IOMMU and fstrim/discard
That said it happened because of kernel bugs caused by a machine with Ryzen CPU.
Ubuntu 16.04 x64 kernel 4.4.0-166.195 CPU : AMD Ryzen 7 2700X Eight-Core Processor Motherboard : X470 GAMING PLUS (MS-7B79) m.2 SSD: Sillicon Powwer P34A80
Easiest solution is put
soft option to
iommu in the kernel setting such as putting
iommu=soft in Grub default. Still I don’t know much about how software iommu works, but it seems to make a delay to map files because it uses software rendering. I rather would like to use hardware rendering.
Fix the Kernel bug and build custom kernel.
That was really painful but I finanlly made it work. All I need is to apply the diff file like below to the kernel. The fix was mentioned in the bug repport page.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 6cbde30..a8bd71c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -470,7 +470,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_dsm_range *range; struct bio *bio; - range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); + range = kmalloc_array(256, sizeof(*range), GFP_ATOMIC); if (!range) return BLK_STS_RESOURCE;
However, if you are using kernel 4.4, there is no
kmalloc_array in the
core.c file. So you need to upgrade your kernel if you are using Ubuntu 16.04 or find your own way.
sudo apt install linux-generic-hwe-16.04
After executing preceding command, you can use kernel 4.15.0, which is used by Ubuntu 18.04.
I made a docker image to build the kernel 4.15.0 for ubuntu 16.04. Make sure the branch is
GitHub – fx-kirin/docker-ubuntu-kernel-build at ubuntu16.04-kernel4.15.0
I also tried to build kernel 4.15 for Ubuntu 18.04. The branch name is
ubuntu18.04. The patch file there is not ready to apply the fix, but it can build the kernel. So if you want to use it, please carefully check the way to apply patch in
docker build -t kernel-build-16.04-4.15.0 . docker run -it --rm -v ~/kbuild/ubuntu16.04/kernel4.15.0:/data -v ~/linux-patches/ubuntu16.04/kernel4.15.0/:/patches -e KERNEL_MAJOR=4.15.0 -e BUILD_CLEAN=Yes kernel-build-16.04-4.15.0
You put the patch file
~/linux-patches/ubuntu16.04/kernel4.15.0/, then you can build the patched kernel. Then copy all
.deb files to your PC and install them with
sudo dpkg -i *.deb command. If it fails, you should to try execute
sudo apt intall -f command and try it again.
After this patch,
fstrim command works perfectly. Congrats.
The interesting thing is that , if you want to use custom version name, you have to change the version name in the file
changelog normally in
debian.master directory (but this time, it is
debian.hwe because I am using HWE Ubuntu).
And custom version name must not contain hyphen
-. see here