Simple Python to Batch search CAZy Database
Monday, April 1, 2013This code may be old. please follow here to the most up-to-date version: https://github.com/ericoporto/CazyBatchSearch
Yes, a way to use a table to search the Carbohydrate-Active enZYmes Database. Below the Python 2.7.3 script.
'''
Copyright 2014 Érico Porto
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''
from __future__ import print_function
import httplib
def findEnzymeInCazy(enzyname):
conn = httplib.HTTPConnection("www.cazy.org", 80)
conn.connect()
conn.request('GET', "/search?tag=4&recherche=" + enzyname)
lines = conn.getresponse().read().split('\n')
for i,line in enumerate(lines):
if ( '<tr><td><a href="http://www.cazy.org/' in line ):
linkLine = lines[i]
j = linkLine.find( '"http://www.cazy.org/' ) + 21
k = linkLine.find( '.html"', j )
return linkLine[j:k] # beteween first and second double quotes
return None
f = open('saida.csv','w')
print("enzima ; rankdomal", file=f)
print("enzima ; rankdomal")
for line in open("enzylist.csv","r"):
pieces = line.split(";")
enzyname = pieces[0].strip()
enzyInCazy = findEnzymeInCazy(enzyname)
if ( enzyInCazy != None ):
print(enzyname.ljust(15) + '; ' + enzyInCazy.ljust(20), file=f)
print(enzyname.ljust(15) + '; ' + enzyInCazy.ljust(20))
else:
print(enzyname.ljust(15) + '; ' + "not found!", file=f)
print(enzyname.ljust(15) + '; ' + "not found!")
So, you will need a file named enzylist.csv for this to work. Here take this:
BAI68730
BAH05588
ACV62532
EAL90874
ABG47447
ACT04224
So if you run it, you should get something like:
python .\enzymtable_simple.py enzima ; rankdomal BAI68730 ; GH57 BAH05588 ; GH94 ACV62532 ; GH94 EAL90874 ; GH13 ABG47447 ; GH18 ACT04224 ; GH51
So all should be good. Yes I run Python on Windows 8 using PowerShell. Just add the Python folder to the PATH variable, and all should run well - and also, PowerShell is much faster and easy to use than CMD.