|
发表于 2004-2-12 11:48:59
|
显示全部楼层
http://freebsd.ntu.edu.tw/bsd/5/19/53.html
- ◇ [hack] File system
- 發信人: [email]Thinker@freebsd.ee.ntu.edu.tw[/email] (Thinker), 看板: BSD
- 標 題: [hack] File system
- 發信站: ?牧汩_始 (Thu Feb 5 20:08:49 1998)
- 轉信站: Maxwell!bbs.ee.ntu!freebsd.ntu!fromzero
- TITLE: File system
- KEYWORDS: [file] [vnode] [mount] [file system] [fileops] [vfsops]
- [vnodeop_desc] [vnodeopv_entry_desc] [vnodeopv_desc]
- [descriptor table]
- ABSTRACT:
- 本文討論關於 file system 部分, 就紅皮書未說明清楚的部分
- 加以補?. 紅皮書是研究 4.4BSD 者不可或缺的指引書籍, 但
- ?際 implement 的一些細?, 在紅皮書裏?K沒有清楚交代.
- 因此在?際 trace source 時, 會有一些失落的環?不易了解.
- 本文就這些環?進行補?, 以便利新進之 hacker 們, 能在紅
- 皮書和程式碼之間做個連結.
- * 紅皮書: The Design and Implementation of the 4.4BSD Operating System,
- Addison Wesley.
- file:
- process 是透過 file descriptor access 檔案, 其過程為
- * 詳見紅皮書 section 6.4 & fingure 6.4
- 其中 file table 是 kernel global variable, 不屬於
- 任何一個 process. 4.4BSD 提供檔案系統和多種的 IPC
- 系統, file 則是做為這些不同系統的共同介面. process 只要
- 透過 file 介面, 就可以使用相的方法和程式碼處理檔案, socket
- 和 pipe, 三種不同的系統. file 是一個物件化的介面, 提供
- 一個 generic 的 structure 和 functions. 這些 function
- 被稱為 operator, 以 pointer 的形式記錄在 operator table 裏
- , file.f_ops (type: "fileops *") 則指向 operator table. 檔案
- 系統和 socket 和 pipe 系統提供各自的 operator table, 再由
- file.f_ops 選擇合適的 table. 如: socket 的 file.f_ops 則
- 指向 socket 的 operator table.
- +-
- file system
- ------ -------------------------
- *f_ops pipeops (pipe)
- socketops (socket)
- vnops (vfs)
- *f_data vnode (vfs)
- socket (socket)
- pipe (pipe)
- * pipeops, socketops, vnops 是 global variables, 分別屬於
- pipe, socket, vfs 等三個 domain. 這三個 domain 都是透過
- file 介面存取, 而不同的 domain 對 file 所定義取介面有不
- 同的解譯和處理. 依 domain 的不同, f_ops (type 是 fileops*)
- 指向不同 domain 所定義的 operator list. kernel 透過
- operator list, 以 indirect function 方式, 呼叫對映功能的
- function. (vfs: virtual file system)
- * f_data 依 domain 的不同而指向不同的 structure. 在 vfs
- domain 是 vnode, 在 socket domain 則是 socket 結構, 在
- pipe domain 裏, 則是 pipe 結構. kernel 使用這些 sturcture
- 紀錄一些相關資?.
- +-
- Vnode:
- 在 BSD 提供/支援多種不同的 file system, 如 ufs, nfs, procfs
- ... etc, kernel 使用 vfs 統合這些不同的 file system, 提供物
- 件化的介面, 隱藏 file system 之間的差?. vnode 在 vfs 裏扮
- 演著一個重要的角色. * 詳見紅皮書 Sectin 6.5
- vnode 裏 member, v_op 的 type 為 "(vop_t **)", vop_t 的定義:
- +-
- typedef int vop_t __P((void *));
- +-
- 這定義了一個標準的介面. v_op 指向一個己定義好的 list, list
- 中每一個 element 都指向代表特定功能的 function (list 中每個
- 位置所代表的意義都已預定了), 這些 function 稱之為 operator.
- 每個 file system 都提供自己的 operator, 所自己的 list. 不同
- file system, 則其 vnode 內的 v_op 指向 file system 所提供的
- operator's list. operator 的第一個也是唯一的一個 argument
- 的 type 為 "(void *)", 這代表可以傳入任何 type 的 ptr. 事?
- 上, 代表特定意義的 operator, 其 argument 的 type 有特定的
- type. 如, VOP_CREATE 的 argument 的 type:
- +-
- struct vop_create_args {
- struct vnodeop_desc *a_desc;
- struct vnode *a_dvp;
- struct vnode **a_vpp;
- struct componentname *a_cnp;
- struct vattr *a_vap;
- };
- +-
- 不論任何一種 file system, 代表 VOP_CREATE 這個功能的
- operator, 它的 argument 都為這個 type. vfs 的 vnops 即是?
- file 的 f_data 取得 vnode, 再透過 vnode 呼叫這些 operator, 完
- 成其功能.
- +-
- -> file -> vnode -> operator list -> operator
- +-
- Mount:
- 在 UNIX 系統中, 任何 file system 都必需事先 mount 在 mount point
- 才能被系統使用. 4.4BSD 在 kernel 裏以 struct mount 表示所有的
- mount point. * 詳見紅皮書 section 6.5 和 Fingure 6.7.
- mount.mnt_op 其 type 為 (struct vfsops *), 定義一些針對整個 file
- system 而非個別檔案的 operator, 如 vfs_mount, vfs_unmout, vfs_sync
- ... etc.
- 初始化:
- vfs 的 initial code 是 kern/vfs_init.c 裏的 vfsubut(),
- file system 的 root 則是由 kern/vfs_conf.c 裏的 vfs_mountroot().
- 附 1:
- +-
- /*
- * Kernel descriptor table.
- * One entry for each open kernel vnode and socket.
- */
- struct file {
- LIST_ENTRY(file) f_list;/* list of active files */
- short f_flag; /* see fcntl.h */
- #define DTYPE_VNODE 1 /* file */
- #define DTYPE_SOCKET 2 /* communications endpoint */
- #define DTYPE_PIPE 3 /* pipe */
- #define DTYPE_FIFO 4 /* fifo (named pipe) */
- short f_type; /* descriptor type */
- short f_count; /* reference count */
- short f_msgcount; /* references from message queue */
- struct ucred *f_cred; /* credentials associated with descriptor */
- struct fileops {
- int (*fo_read) __P((struct file *fp, struct uio *uio,
- struct ucred *cred));
- int (*fo_write) __P((struct file *fp, struct uio *uio,
- struct ucred *cred));
- int (*fo_ioctl) __P((struct file *fp, int com,
- caddr_t data, struct proc *p));
- int (*fo_select) __P((struct file *fp, int which,
- struct proc *p));
- int (*fo_close) __P((struct file *fp, struct proc *p));
- } *f_ops;
- off_t f_offset;
- caddr_t f_data; /* vnode or socket */
- };
- /*
- * Structure per mounted file system. Each mounted file system has an
- * array of operations and an instance record. The file systems are
- * put on a doubly linked list.
- */
- struct mount {
- CIRCLEQ_ENTRY(mount) mnt_list; /* mount list */
- struct vfsops *mnt_op; /* operations on fs */
- struct vfsconf *mnt_vfc; /* configuration info */
- struct vnode *mnt_vnodecovered; /* vnode we mounted on */
- struct vnodelst mnt_vnodelist; /* list of vnodes this mount */
- int mnt_flag; /* flags */
- int mnt_maxsymlinklen; /* max size of short symlink */
- struct statfs mnt_stat; /* cache of filesystem stats */
- qaddr_t mnt_data; /* private data */
- /* struct vfsconf *mnt_vfc; */ /* configuration info */
- time_t mnt_time; /* last time written*/
- };
- /*
- * used to get configured filesystems information
- */
- #define VFS_MAXNAMELEN 32
- struct vfsconf {
- void *vfc_vfsops;
- char vfc_name[VFS_MAXNAMELEN];
- int vfc_index;
- int vfc_refcount;
- int vfc_flags;
- };
- struct vfsops {
- int (*vfs_mount) __P((struct mount *mp, char *path, caddr_t data,
- struct nameidata *ndp, struct proc *p));
- int (*vfs_start) __P((struct mount *mp, int flags,
- struct proc *p));
- int (*vfs_unmount) __P((struct mount *mp, int mntflags,
- struct proc *p));
- int (*vfs_root) __P((struct mount *mp, struct vnode **vpp));
- int (*vfs_quotactl) __P((struct mount *mp, int cmds, uid_t uid,
- caddr_t arg, struct proc *p));
- int (*vfs_statfs) __P((struct mount *mp, struct statfs *sbp,
- struct proc *p));
- int (*vfs_sync) __P((struct mount *mp, int waitfor,
- struct ucred *cred, struct proc *p));
- int (*vfs_vget) __P((struct mount *mp, ino_t ino,
- struct vnode **vpp));
- int (*vfs_fhtovp) __P((struct mount *mp, struct fid *fhp,
- struct mbuf *nam, struct vnode **vpp,
- int *exflagsp, struct ucred **credanonp));
- int (*vfs_vptofh) __P((struct vnode *vp, struct fid *fhp));
- int (*vfs_init) __P((void));
- };
- struct vnode {
- u_long v_flag; /* vnode flags (see below) */
- int v_writecount; /* reference count of writers */
- int v_holdcnt; /* page & buffer references */
- daddr_t v_lastr; /* last read (read-ahead) */
- u_long v_id; /* capability identifier */
- struct mount *v_mount; /* ptr to vfs we are in */
- vop_t **v_op; /* vnode operations vector */
- TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */
- LIST_ENTRY(vnode) v_mntvnodes; /* vnodes for mount point */
- struct buflists v_cleanblkhd; /* clean blocklist head */
- struct buflists v_dirtyblkhd; /* dirty blocklist head */
- long v_numoutput; /* num of writes in progress */
- enum vtype v_type; /* vnode type */
- union {
- struct mount *vu_mountedhere;/* ptr to mounted vfs (VDIR) */
- struct socket *vu_socket; /* unix ipc (VSOCK) */
- struct specinfo *vu_specinfo; /* device (VCHR, VBLK) */
- struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
- struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
- } v_un;
- struct nqlease *v_lease; /* Soft reference to lease */
- daddr_t v_lastw; /* last write (write cluster) */
- daddr_t v_cstart; /* start block of cluster */
- daddr_t v_lasta; /* last allocation */
- int v_clen; /* length of current cluster */
- int v_ralen; /* Read-ahead length */
- int v_usage; /* Vnode usage counter */
- daddr_t v_maxra; /* last readahead block */
- struct vm_object *v_object; /* Place to store VM object */
- enum vtagtype v_tag; /* type of underlying data */
- void *v_data; /* private data for fs */
- };
- struct vnodeop_desc {
- int vdesc_offset; /* offset in vector--first for speed */
- char *vdesc_name; /* a readable name for debugging */
- int vdesc_flags; /* VDESC_* flags */
- /*
- * These ops are used by bypass routines to map and locate arguments.
- * Creds and procs are not needed in bypass routines, but sometimes
- * they are useful to (for example) transport layers.
- * Nameidata is useful because it has a cred in it.
- */
- int *vdesc_vp_offsets; /* list ended by VDESC_NO_OFFSET */
- int vdesc_vpp_offset; /* return vpp location */
- int vdesc_cred_offset; /* cred location, if any */
- int vdesc_proc_offset; /* proc location, if any */
- int vdesc_componentname_offset; /* if any */
- /*
- * Finally, we've got a list of private data (about each operation)
- * for each transport layer. (Support to manage this list is not
- * yet part of BSD.)
- */
- caddr_t *vdesc_transports;
- };
- /*
- * This structure is used to configure the new vnodeops vector.
- */
- struct vnodeopv_entry_desc {
- struct vnodeop_desc *opve_op; /* which operation this is */
- vop_t *opve_impl; /* code implementing this operation */
- };
- struct vnodeopv_desc {
- /* ptr to the ptr to the vector where op should go */
- vop_t ***opv_desc_vector_p;
- struct vnodeopv_entry_desc *opv_desc_ops; /* null terminated list */
- };
- +-
- -----------------------------------------------------------------------------
- 沒事請叫我 Thinker, thx! Thinker
- [email]Thinker.bbs@yzit.edu.tw[/email]
- [email]Thinker@freebsd.ee.ntu.edu.tw[/email]
- Origin: ?牧汩_始 freebsd.ee.ntu.edu.tw (140.112.19.123)
复制代码 |
|