APPENDIX D – DATABASE FORMATS
The IUPAC database (.iup or .txt) is a text file in which each line includes the IUPAC name of each molecule.
File format example:
Bis(2-naphthyl)methane dinaphthalen-2-ylmethane 2-(naphthalen-2-ylmethyl)naphthalene 4H-benzo[d][1,3]dithiine 3,6,8-trioxabicyclo[3.2.2]nonane 2(10),3-Pinadiene Perfluoro(1-methylperhydronaphthalene) 4-(Isopropylidenehydrazono)-2,5-cyclohexadiene-1-carboxylic acid 1-Ethylidene-5-(2-naphthyl)carbonohydrazide
The SMILES database (.smi or .txt) is a text file in which each line includes two fields (the SMILES string and the molecule name) separated by one or more spaces or tabs. If the molecule name contains spaces, it must be quoted by using double quotes.
File format example:
CC(=O)Oc1ccccc1C(O)=O "Aspirin (ASA)" CC1=CN(C2CC(N=NN)C(CO)O2)C(=O)NC1=O AZT CN1C(=O)c2c([n]c[n]2C)N(C)C1=O Caffeine [NH3+][Pt]([NH3+])(Cl)Cl Cisplantin Nc1ccc(cc1)S(=O)(=O)c1ccc(N)cc1 Dapsone CN1C(=O)CN=C(c2cc(Cl)ccc12)c1ccccc1 Diazepam CNCC(O)c1cc(O)c(O)cc1 Epinefrine CC12CCC3C(CCc4cc(O)ccc43)C1CCC2O Estradiol CC(C)Cc1ccc(cc1)C(C)C(O)=O Ibuprofen CN(C)CCCN1c2ccccc2CCc2ccccc12 Imipramine CN1CCCC1c1c[n]ccc1 Nicotine CN(C)CC1CCC(CSCCNC(=C[n](:o):o)NC)O1 Ranitidine CC1OC(OC2C([nH]:c(:[nH2]):[nH2])C(O)C([nH]:c(:[nH2]):[nH2])C(O)C2O)C(OC2OC(CO)C(O)C(O)C2NC)C1(O)C=O Streptomycin