服务器上安装pytorch

服务器上安装pytorch

  • 进入虚拟python环境(以下均在虚拟环境)

    source bin/activate
  • 查看记录cuda版本

    nvcc -V

    当执行nvcc --version,显示

    The program 'nvcc' is currently not installed. To run 'nvcc' please ask your administrator to install the package 'nvidia-cuda-toolkit'

    时,则

    ls /usr/local/

    若见有如下文件

    cuda-8.0

    则cuda是8.0版本的

  • 查看记录python版本

    python -V
  • 升级pip到最新

    pip install -U pip
  • 查看pip版本

    pip -V
  • 登录pytorch官网 http://pytorch.org

    选择linux下用pip,python、cuda选对应版本

    将网页上的命令复制到虚拟环境里的命令行(逐行输入)

  • python安装包

    pip install
  • 测试安装成功否

    python
  • 在python中

    • cuda测试
    import torch
    torch.cuda.is_available()

    若返回True,则cuda可用

    • 求导
    from torch import Tensor as T
    from torch.autograd import Variable as V
    a=V(T([[1,2],[3,4]]),require_grad=True)
    b=a*a
    c=b.mean()
    c.backward()
    a.grad

    返回若为

    Variable containing:
     0.5000  1.0000
     1.5000  2.0000
    [torch.FloatTensor of size 2x2]
    

    则正确

安装过程中的bug

torchvision 报错 _ZN2at7getTypeERKNS_6TensorE

运行含有torchvision的代码,会报错

Traceback (most recent call last):
  File "main.py", line 9, in <module>
    from get_data import get_data
  File "/mfs/haoyu/project/pytorch_learn/my_template/code/get_data.py", line 2, in <module>
    from torchvision import datasets, transforms
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/__init__.py", line 1, in <module>
    from torchvision import models
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/models/__init__.py", line 11, in <module>
    from . import detection
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/models/detection/__init__.py", line 1, in <module>
    from .faster_rcnn import *
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
    from torchvision.ops import misc as misc_nn_ops
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/ops/__init__.py", line 1, in <module>
    from .boxes import nms, box_iou
  File "/home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 2, in <module>
    from torchvision import _C
ImportError: /home/haoyu/ENV/localENV/anaconda3/lib/python3.7/site-packages/torchvision/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at7getTypeERKNS_6TensorE

这是因为torchvision未安装,或安装的版本不对

pip list | grep torch
torch                  1.2.0
torchfile              0.1.0
torchnet               0.0.4
torchvision            0.3.0

版本不对,torch1.2.0应当和torchvision0.4.0配套,故需要升级torchvision。可采用以下方法升级:

首先尝试自动升级

pip install -U torchvision

若未能升级到对应的torchvision版本,则需要手动指定版本,如

pip install torchvision==0.4.0

若依旧未能升级到对应的torchvision版本,说明pip的安装包源没有及时更新,需要手动指定官网安装包:前往pytorch官网,寻找历史torch版本(已build)-pip安装,选择相应的cuda版本,如前往cuda10的pytorch下载网页,寻找对应版本的torchvision,复制这个安装包下载链接

  • 版本号的含义

    torchvision-0.4.0(torchvsion的版本)-cp37(python3.7)-cp37m-manylinux1_x86_64(linux64位).whl

pip install [安装包下载链接]