LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 10330|回复: 30

如何删除重复的行(sed或awk)

[复制链接]
发表于 2004-8-16 09:20:23 | 显示全部楼层 |阅读模式
现有文件如下:
---------------------------------------------------
my friends, chenhong
my friends, chenhong
my friends, chenhong
my teacher, liyong
my teacher, liyong
my teacher, liyong
my father, wuzhongyi
my father, wuzhongyi
my father, wuzhongyi
my sister, wushiying
my sister, wushiying
my sister, wushiying
---------------------------------------------------
现在欲把文件变成如下:
---------------------------------------------------
my friends, chenhong
my teacher, liyong
my father, wuzhongyi
my sister, wushiying
---------------------------------------------------
该如何用sed或者awk,如何解决?
发表于 2004-8-16 09:34:53 | 显示全部楼层
一定要 sed 或 awk 吗?用 uniq 行不?
 楼主| 发表于 2004-8-16 09:39:47 | 显示全部楼层
也行,如何使用?
我个人觉得,sed awk grep组合应能解决所有文本处理的问题。
呵呵~~
发表于 2004-8-16 09:57:12 | 显示全部楼层
uniq:

  1. uniq file
复制代码

awk:

  1. awk '{if ($0!=line) print;line=$0}' file
复制代码
 楼主| 发表于 2004-8-16 10:04:36 | 显示全部楼层
可否解释一下这行的意思:
-------------------------------------------------------
awk '{if ($0!=line) print;line=$0}' file
-------------------------------------------------------
谢谢!
 楼主| 发表于 2004-8-16 10:14:38 | 显示全部楼层
我猜大概是这样:
line应为一个变量。

if ($0!=line) print;line=$0;
可以这样理解:
因为awk也是一次读入一行,line第一次为空,所以自然就不等于$0($0为
"my friend,chenhong"),所以就打印了;接着把line的值赋为$0;然后awk又读
入一行,由于此时$0的值与line相同(均为"my friend,chenhong"),所以就不
打印了。当读入"my teacher, liyong"时,$0与line(值为
"my friend,chenhong")又不同了,所以打印出来。
其余的以此类推。

不知道这样理解,对不对?
发表于 2004-8-16 10:17:46 | 显示全部楼层
没错

如果用 sed 就比较难了,谁来一个?
发表于 2004-8-16 10:56:19 | 显示全部楼层
sed -f rmdup.sed yourfile
here is the rmdup.sed sed script:

  1. #n rmdup.sed - ReMove DUPlicate consecutive lines

  2. # read next line into pattern space (if not the last line)
  3. $!N

  4. # check if pattern space consists of two identical lines
  5. s/^\(.*\)\n\1$/&/
  6. # if yes, goto label RmLn, which will remove the first line in pattern space
  7. t RmLn
  8. # if not, print the first line (and remove it)
  9. P

  10. # garbage handling which simply deletes the first line in the pattern space
  11. : RmLn
  12. D
复制代码
 楼主| 发表于 2004-8-16 11:31:58 | 显示全部楼层
如果文件a如下:
---------------------------------------------------
my friends, chenhong
my teacher, liyong
my teacher, liyong
my father, wuzhongyi
my sister, wushiying
my sister, wushiying
my friends, chenhong
my teacher, liyong
my father, wuzhongyi
my sister, wushiying
my friends, chenhong
my father, wuzhongyi
---------------------------------------------------
现在欲把文件变成如下:
---------------------------------------------------
my friends, chenhong
my teacher, liyong
my father, wuzhongyi
my sister, wushiying
---------------------------------------------------
又该如何?
发表于 2004-8-16 11:35:44 | 显示全部楼层
use `sort' first. there is no EFFICIENT way of sorting in sed/awk
您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表