LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 3079|回复: 13

perl编写的小脚本 有些问题请指教

[复制链接]
发表于 2009-7-23 22:27:00 | 显示全部楼层 |阅读模式
#!/usr/bin/perl -w
print "please enter the source filename(include suffix):";
chomp($file=<STDIN>);
print "please enter the output filename(include suffix):";
chomp($out=<STDIN>);
open(OUT,"$out")||die "can't open the file!\n";
open(FILE,"$file")||die "can't open the file!\n";
$IN=<FILE>;
$i=0;
while($IN){
if(/nematode/gi){
print OUT $IN;
$i++;
}
$FL=<FILE>;
}
print "the total of nematode sequence is $i";
close(FILE);
close(OUT);
我要通过这个脚本查找序列文件,并将含有nematode这一文件名的序列拷贝到另一文件之中这个脚本能做到吗?
发表于 2009-7-24 00:04:34 | 显示全部楼层
不能做到
最好把一份输入文件的片段贴上来,才好帮你修改
回复 支持 反对

使用道具 举报

 楼主| 发表于 2009-7-24 20:29:29 | 显示全部楼层
比如我要从下面文件中把含有 Calponin protein 这个字符的序列查找出来并且将其拷贝到另外一个文件中保存(每个序列都是以>号开头),输入的文件太大30多个G,只能截取极少部分序列。谢谢你拉!!
>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence
GGCCGGGTCGCGAAGAGCCAGGGAGTGCCCACCGAGGAGACCTTCCAGAGCGTGGACTTGTTCGAGGCAC
GTGACCTCTACTCCGTGTGCATGACCCTGTTGTCGTTGGGCCGAATTATGGAGAAGAAGGGAAAGCCGAA
CCCATTCTCTGGATGAAGAGTAGAAGTGAGTGCAGCAAAAGACGGACAGAGCATGTGCTATCGCTCCCAT
TCGAGAACTCCCCGTTTTGCGAATTTTCTCCCCGTGTGCGACCTTGCAACAATACCGAGGCGTTAACTGT
TTTCCCCTCTCTCTCCTCTCAAACTGTGGGCATTTGAAAATAGCGATGCCGGAATAAATGGCCAATTCCA

>gi|159498512|gb|EY195366.1|EY195366 RSAA-aab21e06.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGATTGGCACGTAATTCAACTGGTCTTTTTGCTGGGCGGCAATTGGCAACTTTATCCGACAGAAG
CATGATAATTGGTGGAGCCGCCCTTTGCCGACGCACTTTTGCAGCCCCACCCCCCCATCGGGACACGGTA
TTCGAGGTAGAATCGGACGAGGACTTCGAGAATGGGGTATACCATGCCGAAAAACCAGTGCTCATGCAGT
TTTACGCCGATTGGTGTGGACCCTGTCAGAATTTGGCACCGAGGTTGATTGCCAAAGTGAATGGACAGGA
TGGGAAGGTATTGCTAGCGAGAGTGAATGTAGAGGGTTCCGCCGGATTTCTCGCGGAACAGTTTGATGTA
AGCTCGATTCCCACTGTGATGTGCTGGCTGCGAGGAGAGGTGGTTGACCGTTTTGAGGGCGACGTGGAAG
ACACGAAGATTGACCAAATCATATCCAAATTGGTGGAATATCAAACTGGAAATGAATGATAAAGGGCTTG
AACAGCCATTTAGTAGACGAAAAAAAAAAAAAAAA

>gi|159498511|gb|EY195365.1|EY195365 RSAA-aab21e05.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGTGCTGCAGGCATGTCGCACCGTCCCGCCACTCATCATCGTCCCTTCCCTGTCCCGCCCCGGCA
CACCCTTCCTGTCCAGACATTCAACTGCGGTGTGGTCGAAGTCTCTCCCAAGTCAATACCCGATGCTCCG
CCGCCATACGAGGAGTTTGTGCGTGTTCCACCACCACCGCCACAAAGGGCACCGCCCATTCTGACGCGGG
AAGAGGATGAGGAGTTGCAGAACAGACTGAACTCGGAGAGAGAGAGGGAACTGAGCGACTGGTGACATTT
GGTTTTGTCGAGTGCTGCAGCTTCGCACCATTTCCCTTTATATACGGGACTTTCTTCATTTCTTTTGTTC
CTGACTTAACAACAATTAATAGACCA

>gi|159498510|gb|EY195364.1|EY195364 RSAA-aab21e04.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_500582.1| GTP-binding protein like (21.7 kD) [Caenorhabditis elegans] sp|Q23445|SAR1_CAEEL GTP-binding protein SAR1 pir|T29706 GTP-binding protein ZK180.4 [similarity] - Caenorhabditis elegans gb|AAB52968.1| Hypothetical protein ZK180.4 [C, mRNA sequence
GGCCGGGGGACGCCATTGTATTTTTGGTCGATGTAGCCGACCTGGAACGTATTCAGGAAGCAAGGGAGGA
ATTGTGGAGTCTGATGCAGGATGAACAGGTGGCAAGTGCACCTGTGCTTGTTTTGGGCAATAAGATCGAC
AAGCCGAATGCTCTCAGCGAAGACCAGCTCAAGTACTACCTCGGCATCCAACAATACTGCACAGGAAAAG
GCCAAGTTGCGCGCTCAGATCTGGCCACTCGTCCTTTGGATGTGTTCATGTGCTCAGTCCTTAGGCGACA
GGGTTACGGCGAAGGATTCTGTTGGCTCTCACAATATCTGGACTGATTGAACGCGCCTCGGAAGTTGAAA
ATTGACACAAAAAGTAAGGACGACTCCAATCGCAACAAATCATTTCATATTATTTCTGTACTACACCTAT
TTTCGATTCATCTTATCTCTTAAACAATGTCAATGTTAAAAATCATCGGTTGCA

>gi|159498509|gb|EY195363.1|EY195363 RSAA-aab21e02.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to gb|AAP59456.1| cathepsin B precursor [Araneus ventricosus], mRNA sequence
GGCCGGGGCTCGGGGGCCATGCCGTTCGCATTATTGGATGGGGCGAGGCTAGCGGTCAGAAATACTGGCT
GGTGGCTAATTCGTGGAACACCGATTGGGGCGAGAAGGGCCTATTCCGCATACGTCGCGGCTCCGATGAA
GAGCGCATCGAAACATTGCAAATTGCATTTGGGACACCAAAGATTTAAAATCGGCGAATTGACTTGTAAA
AGATGGATAGTAAAATATTTCTTTTGCCA

>gi|159498508|gb|EY195362.1|EY195362 RSAA-aab21d12.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGGACATTAAGTGCAATCAATTCGCCACATAATGTGATGTCAAAATTTAAATCTGAAACTTGGAT
CTATTGTAACAACATCGGATGGACCATCAGCTGGTGGGAGCAAGCAACAAGAGATGGAAGACGATACACC
GGCCAAAGAGCAAGAGGAAGAACGCGAACTGGGCGAAGAGGATGGATTTGCGCAAAACCATCACAATAGT
CAATTTTCGTAGCCTCCCAACGCCAACAGCCGCCTTTCTGGCACATTATGTGAAGAGTGATCGTCCATTC
CATGCGCTGTTCCGTCGTCGACCTGTTCCTGCGCACTTGGGGGAGACCAAGGCCAAATCGATGTAATTTA
ATTGAACAAAAAATTAATGCAGTTCACGGCTTGTCTTTCATGCCTTGGATGAACTCTTCATTCATTCGGA
ATCAACCATGGCCACGTTACGTCAACTGGAAGATTTGCCGGAGAATGTGCTGGCTAGACGGAGGCTCCAG
ACAGTTCGAGCCAACGAACTCGTCAATTAGAGAAACCACAAAATATAAAGTTGACATTTTATGAATAAAT
ATATGAAAAAAAAAAAAAAAA

>gi|159498507|gb|EY195361.1|EY195361 RSAA-aab21d11.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGATTTTTTGTAATTATGATTTTAGGTTATAAATAATTAATGAAGTATAAACTATTAATGTTATA
TTTTTTAGATAAATTAATTTTTCTAGTAATTTATTTTTAGATAAATTAATGTTCCAGAAATATCGGCTAG
ACATTATTATTTTTAACTAAAAACTTTTTAAATTTTATTTTAATTTATATAAATTTATATTAATATAGGT
GAAATTTTAATTATAATTATGTTAATAATTTTATAAAATTTAAAATTTTAATTTTTAACTTAGGTTAGAC
ACTAATTAATGAAATTTAATAATTTTCTTTAGTAAATTTTTGA

>gi|159498506|gb|EY195360.1|EY195360 RSAA-aab21d10.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491217.1| Forkhead associated domain containing protein (35.8 kD) [Caenorhabditis elegans] pir|T25596 hypothetical protein C32E8.5 - Caenorhabditis elegans gb|AAB42323.1| Hypothetical protein C32E8.5 [Caenorhabditis elegans], mRNA sequence
GGCCGGGGAAAGAGCCAGGCGAAGAAGACACGAAAGGAATGGGCCCAAGTGAAGAGGAGAAGGAAAAACC
GTCTTTTGTGCCCAGCGGAAAGTTGGCTAAAGACACCAACACATTCAAAGGAGTCCTCATCAAGTACAAT
GAACCGCCAGAAGCCAAGATTCCCAAGTTGCGTTGGCGCATGTATCCGTTCAAGGGAGAGCAAGACATGC
CTGTGATCTATGTGCACCGTCAGTCAGCCTATCTGGTTGGGCGGGACCGAAAAATTGCCGATTTTCCCGT
GGACCATCCGAGTTGTTCAAAGCAGCACGCAGCACTCCAGTATCGGTCTCTG
回复 支持 反对

使用道具 举报

发表于 2009-7-24 20:42:06 | 显示全部楼层
  1. print "please enter the source filename(include suffix):";
  2. chomp($file=<STDIN>);
  3. print "please enter the output filename(include suffix):";
  4. chomp($out=<STDIN>);
  5. open(OUT,">$out")||die "can't open the file $out: $@";
  6. open(FILE,"$file")||die "can't open the file $file: $@";
  7. #$IN=<FILE>;
  8. $i=0;
  9. while(<FILE>){
  10.     if (/^>.*Calponin protein/){
  11. #if(//gi){
  12.         print OUT $_;
  13.         $i++;
  14.     }
  15. }
  16. #$FL=<FILE>;
  17. #}
  18. print "the total of nematode sequence is $i";
  19. close(FILE);
  20. close(OUT);
复制代码
如果在LZ代码的基础上修改,那大概就这样了。注释了LZ原来的一些语句。
LZ又说要找Calponin protein,但第一贴中又用/nematode/?
看来LZ还没入门。
回复 支持 反对

使用道具 举报

 楼主| 发表于 2009-7-25 09:50:45 | 显示全部楼层
谢谢这个仁兄帮我修改,我的确是刚刚自学perl,以后还得向仁兄请教,我编的可能程序很粗糙,要是有可能,能不能麻烦您帮我写一个?小弟感激不尽。我卡在这里好久了,一下工作没有办法向下走了,谢谢啦!
我本是在一个大的文本文档中约32G中找出含有nematode文件并拷贝出来
但是我只是贴出了其中很少一部分而且,我贴出的部分没有这个关键词,所以就临时换了关键词请求仁兄帮我修改程序了。、
刚才我按仁兄为我改的程序跑了一下,可以拷贝出>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence

但是这个下边我的序列信息没有拷贝过来原因是什么呢?请帮帮忙谢谢啦
GGCCGGGTCGCGAAGAGCCAGGGAGTGCCCACCGAGGAGACCTTCCAGAGCGTGGACTTGTTCGAGGCAC
GTGACCTCTACTCCGTGTGCATGACCCTGTTGTCGTTGGGCCGAATTATGGAGAAGAAGGGAAAGCCGAA
CCCATTCTCTGGATGAAGAGTAGAAGTGAGTGCAGCAAAAGACGGACAGAGCATGTGCTATCGCTCCCAT
TCGAGAACTCCCCGTTTTGCGAATTTTCTCCCCGTGTGCGACCTTGCAACAATACCGAGGCGTTAACTGT这些没有拷贝过去
TTTCCCCTCTCTCTCCTCTCAAACTGTGGGCATTTGAAAATAGCGATGCCGGAATAAATGGCCAATTCCA
回复 支持 反对

使用道具 举报

发表于 2009-7-25 10:04:38 | 显示全部楼层
Post by foumer;2009375
谢谢这个仁兄帮我修改,我的确是刚刚自学perl,以后还得向仁兄请教,我编的可能程序很粗糙,要是有可能,能不能麻烦您帮我写一个?小弟感激不尽。我卡在这里好久了,一下工作没有办法向下走了,谢谢啦!
我本是在一个大的文本文档中约32G中找出含有nematode文件并拷贝出来
但是我只是贴出了其中很少一部分而且,我贴出的部分没有这个关键词,所以就临时换了关键词请求仁兄帮我修改程序了。、
刚才我按仁兄为我改的程序跑了一下,可以拷贝出>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence

但是这个下边我的序列信息没有拷贝过来原因是什么呢?请帮帮忙谢谢啦
GGCCGGGTCGCGAAGAGCCAGGGAGTGCCCACCGAGGAGACCTTCCAGAGCGTGGACTTGTTCGAGGCAC
GTGACCTCTACTCCGTGTGCATGACCCTGTTGTCGTTGGGCCGAATTATGGAGAAGAAGGGAAAGCCGAA
CCCATTCTCTGGATGAAGAGTAGAAGTGAGTGCAGCAAAAGACGGACAGAGCATGTGCTATCGCTCCCAT
TCGAGAACTCCCCGTTTTGCGAATTTTCTCCCCGTGTGCGACCTTGCAACAATACCGAGGCGTTAACTGT这些没有拷贝过去
TTTCCCCTCTCTCTCCTCTCAAACTGTGGGCATTTGAAAATAGCGATGCCGGAATAAATGGCCAATTCCA

你的东西太专业,我都不知道你说的序列信息是指这些
试一下这样吧:
  1. print "please enter the source filename(include suffix):";
  2. chomp($file=<STDIN>);
  3. print "please enter the output filename(include suffix):";
  4. chomp($out=<STDIN>);
  5. open(OUT,">$out")||die "can't open the file $out: $@"; #输出文件要加 >
  6. open(FILE,"$file")||die "can't open the file $file: $@";
  7. #$IN=<FILE>;
  8. $i=0;
  9. while(<FILE>){
  10.     if (/^>.*Calponin protein/){ #匹配开头为>,并含有Calponin protein的行
  11. #if(//gi){
  12.         print OUT $_;
  13.         while (($sequence=<FILE>)!~/^\s*$/){ #输出序列。如果不是空行,则输出。
  14.             print OUT $sequence;
  15.         }
  16.         print OUT "\n"; #信息条目输出完成,加空行
  17.         $i++;
  18.     }
  19. }
  20. #$FL=<FILE>;
  21. #}
  22. print "the total of nematode sequence is $i";
  23. close(FILE);
  24. close(OUT);
复制代码
回复 支持 反对

使用道具 举报

 楼主| 发表于 2009-7-25 10:17:57 | 显示全部楼层
我要的东西是>号后的所有字符
如:>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence
GGCCGGGTCGCGAAGAGCCAGGGAGTGCCCACCGAGGAGACCTTCCAGAGCGTGGACTTGTTCGAGGCAC
GTGACCTCTACTCCGTGTGCATGACCCTGTTGTCGTTGGGCCGAATTATGGAGAAGAAGGGAAAGCCGAA
CCCATTCTCTGGATGAAGAGTAGAAGTGAGTGCAGCAAAAGACGGACAGAGCATGTGCTATCGCTCCCAT
TCGAGAACTCCCCGTTTTGCGAATTTTCTCCCCGTGTGCGACCTTGCAACAATACCGAGGCGTTAACTGT
TTTCCCCTCTCTCTCCTCTCAAACTGTGGGCATTTGAAAATAGCGATGCCGGAATAAATGGCCAATTCCA

按仁兄给我修改的程序,只能拷贝:>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence这一部分,下边的GGCCGGGTCGCGAAG。。。。。。都没有了,这一部分也是我要的内容,谢谢你拉
回复 支持 反对

使用道具 举报

发表于 2009-7-25 10:25:03 | 显示全部楼层
已加代码在上一贴
回复 支持 反对

使用道具 举报

 楼主| 发表于 2009-8-19 16:24:45 | 显示全部楼层
cheeselee:
您好
还得麻烦你拉,说上次你给我修改的perl脚本帮了我大忙,但是现在又有点问题
>gi|159498513|gb|EY195367.1|EY195367 RSAA-aab21e07.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491282.1| Calponin homology (CH) domain containing protein family member [Caenorhabditis elegans] pir|T29467 hypothetical protein F28H1.2 - Caenorhabditis elegans gb|AAB52338.1| Calponin protein 3 [Caenorhabditis elegans], mRNA sequence
GGCCGGGTCGCGAAGAGCCAGGGAGTGCCCACCGAGGAGACCTTCCAGAGCGTGGACTTGTTCGAGGCAC
GTGACCTCTACTCCGTGTGCATGACCCTGTTGTCGTTGGGCCGAATTATGGAGAAGAAGGGAAAGCCGAA
CCCATTCTCTGGATGAAGAGTAGAAGTGAGTGCAGCAAAAGACGGACAGAGCATGTGCTATCGCTCCCAT
TCGAGAACTCCCCGTTTTGCGAATTTTCTCCCCGTGTGCGACCTTGCAACAATACCGAGGCGTTAACTGT
TTTCCCCTCTCTCTCCTCTCAAACTGTGGGCATTTGAAAATAGCGATGCCGGAATAAATGGCCAATTCCA
>gi|159498512|gb|EY195366.1|EY195366 RSAA-aab21e06.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGATTGGCACGTAATTCAACTGGTCTTTTTGCTGGGCGGCAATTGGCAACTTTATCCGACAGAAG
CATGATAATTGGTGGAGCCGCCCTTTGCCGACGCACTTTTGCAGCCCCACCCCCCCATCGGGACACGGTA
TTCGAGGTAGAATCGGACGAGGACTTCGAGAATGGGGTATACCATGCCGAAAAACCAGTGCTCATGCAGT
TTTACGCCGATTGGTGTGGACCCTGTCAGAATTTGGCACCGAGGTTGATTGCCAAAGTGAATGGACAGGA
TGGGAAGGTATTGCTAGCGAGAGTGAATGTAGAGGGTTCCGCCGGATTTCTCGCGGAACAGTTTGATGTA
AGCTCGATTCCCACTGTGATGTGCTGGCTGCGAGGAGAGGTGGTTGACCGTTTTGAGGGCGACGTGGAAG
ACACGAAGATTGACCAAATCATATCCAAATTGGTGGAATATCAAACTGGAAATGAATGATAAAGGGCTTG
AACAGCCATTTAGTAGACGAAAAAAAAAAAAAAAA
>gi|159498511|gb|EY195365.1|EY195365 RSAA-aab21e05.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGTGCTGCAGGCATGTCGCACCGTCCCGCCACTCATCATCGTCCCTTCCCTGTCCCGCCCCGGCA
CACCCTTCCTGTCCAGACATTCAACTGCGGTGTGGTCGAAGTCTCTCCCAAGTCAATACCCGATGCTCCG
CCGCCATACGAGGAGTTTGTGCGTGTTCCACCACCACCGCCACAAAGGGCACCGCCCATTCTGACGCGGG
AAGAGGATGAGGAGTTGCAGAACAGACTGAACTCGGAGAGAGAGAGGGAACTGAGCGACTGGTGACATTT
GGTTTTGTCGAGTGCTGCAGCTTCGCACCATTTCCCTTTATATACGGGACTTTCTTCATTTCTTTTGTTC
CTGACTTAACAACAATTAATAGACCA
>gi|159498510|gb|EY195364.1|EY195364 RSAA-aab21e04.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_500582.1| GTP-binding protein like (21.7 kD) [Caenorhabditis elegans] sp|Q23445|SAR1_CAEEL GTP-binding protein SAR1 pir|T29706 GTP-binding protein ZK180.4 [similarity] - Caenorhabditis elegans gb|AAB52968.1| Hypothetical protein ZK180.4 [C, mRNA sequence
GGCCGGGGGACGCCATTGTATTTTTGGTCGATGTAGCCGACCTGGAACGTATTCAGGAAGCAAGGGAGGA
ATTGTGGAGTCTGATGCAGGATGAACAGGTGGCAAGTGCACCTGTGCTTGTTTTGGGCAATAAGATCGAC
AAGCCGAATGCTCTCAGCGAAGACCAGCTCAAGTACTACCTCGGCATCCAACAATACTGCACAGGAAAAG
GCCAAGTTGCGCGCTCAGATCTGGCCACTCGTCCTTTGGATGTGTTCATGTGCTCAGTCCTTAGGCGACA
GGGTTACGGCGAAGGATTCTGTTGGCTCTCACAATATCTGGACTGATTGAACGCGCCTCGGAAGTTGAAA
ATTGACACAAAAAGTAAGGACGACTCCAATCGCAACAAATCATTTCATATTATTTCTGTACTACACCTAT
TTTCGATTCATCTTATCTCTTAAACAATGTCAATGTTAAAAATCATCGGTTGCA
>gi|159498509|gb|EY195363.1|EY195363 RSAA-aab21e02.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to gb|AAP59456.1| cathepsin B precursor [Araneus ventricosus], mRNA sequence
GGCCGGGGCTCGGGGGCCATGCCGTTCGCATTATTGGATGGGGCGAGGCTAGCGGTCAGAAATACTGGCT
GGTGGCTAATTCGTGGAACACCGATTGGGGCGAGAAGGGCCTATTCCGCATACGTCGCGGCTCCGATGAA
GAGCGCATCGAAACATTGCAAATTGCATTTGGGACACCAAAGATTTAAAATCGGCGAATTGACTTGTAAA
AGATGGATAGTAAAATATTTCTTTTGCCA
>gi|159498508|gb|EY195362.1|EY195362 RSAA-aab21d12.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGGACATTAAGTGCAATCAATTCGCCACATAATGTGATGTCAAAATTTAAATCTGAAACTTGGAT
CTATTGTAACAACATCGGATGGACCATCAGCTGGTGGGAGCAAGCAACAAGAGATGGAAGACGATACACC
GGCCAAAGAGCAAGAGGAAGAACGCGAACTGGGCGAAGAGGATGGATTTGCGCAAAACCATCACAATAGT
CAATTTTCGTAGCCTCCCAACGCCAACAGCCGCCTTTCTGGCACATTATGTGAAGAGTGATCGTCCATTC
CATGCGCTGTTCCGTCGTCGACCTGTTCCTGCGCACTTGGGGGAGACCAAGGCCAAATCGATGTAATTTA
ATTGAACAAAAAATTAATGCAGTTCACGGCTTGTCTTTCATGCCTTGGATGAACTCTTCATTCATTCGGA
ATCAACCATGGCCACGTTACGTCAACTGGAAGATTTGCCGGAGAATGTGCTGGCTAGACGGAGGCTCCAG
ACAGTTCGAGCCAACGAACTCGTCAATTAGAGAAACCACAAAATATAAAGTTGACATTTTATGAATAAAT
ATATGAAAAAAAAAAAAAAAA
>gi|159498507|gb|EY195361.1|EY195361 RSAA-aab21d11.g1 R.similis_EST_RSAA Radopholus similis cDNA 5', mRNA sequence
GGCCGGGATTTTTTGTAATTATGATTTTAGGTTATAAATAATTAATGAAGTATAAACTATTAATGTTATA
TTTTTTAGATAAATTAATTTTTCTAGTAATTTATTTTTAGATAAATTAATGTTCCAGAAATATCGGCTAG
ACATTATTATTTTTAACTAAAAACTTTTTAAATTTTATTTTAATTTATATAAATTTATATTAATATAGGT
GAAATTTTAATTATAATTATGTTAATAATTTTATAAAATTTAAAATTTTAATTTTTAACTTAGGTTAGAC
ACTAATTAATGAAATTTAATAATTTTCTTTAGTAAATTTTTGA
>gi|159498506|gb|EY195360.1|EY195360 RSAA-aab21d10.g1 R.similis_EST_RSAA Radopholus similis cDNA 5' similar to ref|NP_491217.1| Forkhead associated domain containing protein (35.8 kD) [Caenorhabditis elegans] pir|T25596 hypothetical protein C32E8.5 - Caenorhabditis elegans gb|AAB42323.1| Hypothetical protein C32E8.5 [Caenorhabditis elegans], mRNA sequence
GGCCGGGGAAAGAGCCAGGCGAAGAAGACACGAAAGGAATGGGCCCAAGTGAAGAGGAGAAGGAAAAACC
GTCTTTTGTGCCCAGCGGAAAGTTGGCTAAAGACACCAACACATTCAAAGGAGTCCTCATCAAGTACAAT
GAACCGCCAGAAGCCAAGATTCCCAAGTTGCGTTGGCGCATGTATCCGTTCAAGGGAGAGCAAGACATGC
CTGTGATCTATGTGCACCGTCAGTCAGCCTATCTGGTTGGGCGGGACCGAAAAATTGCCGATTTTCCCGT
GGACCATCCGAGTTGTTCAAAGCAGCACGCAGCACTCCAGTATCGGTCTCTG
像以上数据脚本就不能用了,就是每个序列之间没有空格或空行的话,他就不能筛选出来!
还得麻烦你帮我修改下啦谢谢啦
回复 支持 反对

使用道具 举报

发表于 2009-8-20 18:49:02 | 显示全部楼层
  1. print "please enter the source filename(include suffix):";
  2. chomp($file=<STDIN>);
  3. print "please enter the output filename(include suffix):";
  4. chomp($out=<STDIN>);
  5. open(OUT,">$out")||die "can't open the file $out: $@"; #输出文件要加 >
  6. open(FILE,"$file")||die "can't open the file $file: $@";
  7. $i=0;
  8. $read=0; #标记该行是否写入
  9. while(<FILE>){
  10.     if (/^>.*Calponin protein/){ #匹配开头为>,并含有Calponin protein的行
  11.         $read=1;
  12.         $i++;
  13.     }elsif (/^>/ and not /Calponin protein/){
  14.         $read=0;
  15.     }
  16.     print OUT $_ if $read; #如果$read==1,则将该行输出
  17. }
  18. print "the total of nematode sequence is $i\n";
  19. close(FILE);
  20. close(OUT);
复制代码
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表