Simple Python to Batch search CAZy Database



This code may be old. please follow here to the most up-to-date version: https://github.com/ericoporto/CazyBatchSearch

Yes, a way to use a table to search the Carbohydrate-Active enZYmes Database. Below the Python 2.7.3 script.


'''
Copyright 2014 Érico Porto

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import print_function  
 import httplib  
 def findEnzymeInCazy(enzyname):  
   conn = httplib.HTTPConnection("www.cazy.org", 80)  
   conn.connect()  
   conn.request('GET', "/search?tag=4&recherche=" + enzyname)  
   lines = conn.getresponse().read().split('\n')  
   for i,line in enumerate(lines):  
     if ( '<tr><td><a href="http://www.cazy.org/' in line ):  
       linkLine = lines[i]  
       j = linkLine.find( '"http://www.cazy.org/' ) + 21  
       k = linkLine.find( '.html"', j )  
       return linkLine[j:k] # beteween first and second double quotes  
   return None  
 f = open('saida.csv','w')  
 print("enzima     ; rankdomal", file=f)  
 print("enzima     ; rankdomal")   
 for line in open("enzylist.csv","r"):  
   pieces = line.split(";")  
   enzyname = pieces[0].strip()  
   enzyInCazy = findEnzymeInCazy(enzyname)  
   if ( enzyInCazy != None ):  
     print(enzyname.ljust(15) + '; ' + enzyInCazy.ljust(20), file=f)  
     print(enzyname.ljust(15) + '; ' + enzyInCazy.ljust(20))  
   else:  
     print(enzyname.ljust(15) + '; ' + "not found!", file=f)  
     print(enzyname.ljust(15) + '; ' + "not found!")  

So, you will need a file named enzylist.csv for this to work. Here take this:

 BAI68730
 BAH05588
 ACV62532
 EAL90874
 ABG47447
 ACT04224

So if you run it, you should get something like:

 python .\enzymtable_simple.py  
 enzima     ; rankdomal  
 BAI68730    ; GH57  
 BAH05588    ; GH94  
 ACV62532    ; GH94  
 EAL90874    ; GH13  
 ABG47447    ; GH18  
 ACT04224    ; GH51  

So all should be good. Yes I run Python on Windows 8 using PowerShell. Just add the Python folder to the PATH variable, and all should run well - and also, PowerShell is much faster and easy to use than CMD.

Powered by Blogger.