Cheminfo Retrieval FAQ

2009 FAQ

(thanks to Bill Hooker, Meg and Elizabeth Brown for suggestions)

1. List and briefly describe 5 social networking sites where chemistry is discussed.

1. Chemistry Blog- More of a blog than a social networking site, this website allows scientists to post discussion questions to colleagues around the world. Only members may reply but anyone can view.
2. Facebook- While mainly for the college-age crowd, if a member of facebook joins a chemistry based fan page that updates regularly, much information can be distributed to those that care about it. A member of ACS could easily join that Facebook page and be updated on news and events relevant to his or her interests.
3. Linked-In- The online resume website allows members in need of chemical information to specifically find other members in that field. While information is not normally transferred on this website, it is the catalyst to many important transfers of information.
4. Friend Feed- Much like Facebook, Friend Feed allows a member to "friend" other members to exchange information. It is a great environment to meet and talk with other people that have your same interest in chemistry.
5. Lab Roots- A smaller version of Facebook, but designed purely for the scientific community. It is built to exchange scientific information and for the most part is all it is used for.

-James Brooks
[Full Marks JCB]

2. List and briefly describe 5 reference management tools which can be used in chemistry.

Below is a chart (partial) from Wikipedia ( of current reference management tools, comparing cost, whether or not the reference tool is open source, and the type of license. I will focus my discussion on only 5 reference tools which can be used in chemistry.

Reference Managment Tool #1: EndNote (
The program allows you to search internet databases remotely, allowing direct import of bibliographic references into EndNote from hundreds of internet databases, such as Web of Science, PubMed, Ovid, and the Library of Congress. The user can create libraries of references and reference groups. Additionally, the program uses “Cite While You Write” technology, allowing the user to build a bibliography in real time as citations are inserted within a manuscript. The program includes more than 4,500 bibliographic styles so that users can make sure the bibliographic details of their manuscript complies with journal requirements when submitting work to be published. Cost is ~$300. Excellent tool for the management of chemistry-related references.

Reference Managment Tool #2: RefWorks (
This program is very similar to EndNote in terms of functionality. [From personal experience, the program is very easy to use and to get started: after watching only ~30min of tutorial videos on the website, I was able to successfully start my research paper, building a bibliography with inserted citations using Microsoft Office Word 2007 and the included “Write-N-Cite” tool that is downloaded from the site.] Similar to EndNote, internet databases can be searched directly from EndNote; unlimited number of references can be assembled into a library and subgroups; all of the major bibliographic styles are available for journal-specific compatibility; and “Write-N-Cite” similarly allows building of bibliography and insertion of references within word processing documents. One of the key advantages over EndNote is that RefWorks can be used from any computer without limit to access while EndNote only permits a license for 3 computers. Cost is ~$100/yr. Excellent tool for the management of chemistry-related references.

Reference Managment Tool #3: Mendeley (
This program is a multifunctional program that shares many of the standard bibliographic solutions found in EndNote and RefWorks, such as creating and inserting bibliographies into word processing documents. However, the program is also an academic social network and seems to be focused more on importing, searching, and sharing full length academic papers within preselected individual research group categories from the site. One cool thing that the program offers is the ability to build a database of research papers that is backed up on the web which can be synced and accessed through any computer, including an iPhone. Cost is Free! The disadvantages are that there are not a lot of bells and whistles for the actual bibliographic portion of the program: it is unclear if the program has automatic selection of journal-specific bibliographic formats that RefWorks and EndNote both possess. Poor tool for the management of chemistry-related references. [This is actually the main tool I use to manage chemistry references :) I explain why here . JCB]

Reference Managment Tool #4: Scholar's Aid (
This program tries to imitate RefWorks and EndNote, but ultimately falls way short. While RefWorks and EndNote can be used with almost any computer, Scholar’s Aid can not currently be used on Macs or Linux. Another major drawback is that there doesn’t seem to include a list of bibliographic styles that can be chosen in order to comply with the bibliographic standards of journals when attempting to publish. One plus for Scholar’s Aid is that different types of references, parenthetical, footnotes as well as bibliographic references, can be imported directly into a document via “AutoCitation.” Cost is ~$149. Poor tool for the management of chemistry-related

Reference Managment Tool #5: Biblioscape (
This program contains many of the functions of RefWorks and EndNote. For instance, the program can be used to assemble literature references into a bibliography that can be exported into documents. The program also allows citations and bibliographies to be formatted according to at least 1,000 different journal styles so that submitted articles can comply with journal-specific requirements for publishing. While the program was originally released for managing references, the program has evolved into a multifunctional tool that can now allow management of notes, tasks, ideas, charts, libraries, categories, and compositions all related to the user’s research. As a result, the program can be used for article, book, or thesis writing. Cost is ~$79-$399. Excellent tool for the management of chemistry-related references.

-Will Stedman
[Full Marks JCB]

3. Describe and compare the data management and reporting policies of the NIH and NSF pertaining to chemistry.

Overall data management is broken down into thre areas - Security, Archival, and data certification. Data security issues relate to data access and retrieval through a variety of means. Data archival relates to the storage medium and backup systems thare are required to be in place. The third area of data management is the most important as it relates to how data is inputed, verified and to what level of accuracy must be captured before the data can be considered useful.(1)

NIH Data management
NIH Security Protocal is divided between on campus and off campus badging process. These are separated into a variety of different security levels and are issued by two entities which include the NIH PD and the Division of Personnel Security and Access Control. These badges also provide access to all computer termianls and building access via SmartCard or other 'legacy' device. (These locations include the Bethesda Campus, Ft.. Detrick, Biomedical Research Center, Rocky Mountain Labs and Triangle Park). All NIH servers, computers(Laptops & Desktops) must have smartcard access in order to secure any data which would have an adverse effect on operations(2) and will apply to laptops and web accessible programs by May of 2011. Those that are members of outside organizations can utlize the NIH Federated Authentication tools to use their own onsite authentication methods. In the case of off campus research which is funded by the NIH grantees are required to comply with policies designed to prevent the disclosure of sensitive information as goverend by FISMA. .

NIH set requirements for data archival and destruction of the data devices. These services are provided by the Center of Information Technology(CIT) and regulated by the Federal Information Processign Standards Publication(FIPS) 200. These policies require that all drives be sterilized at the end of term in order to keep data secure. NIH also allows individual laboratories to set data managment and reporting policies for deliverables by non-us laboratories(3). This regulations are governed under federal regulations 21 CFR 58 and 42 CLP 493. These define that the data must allow for optimal laboratory operations through annual audits and visits which provide consistent and reproducible results.(3)

There are also polcies that require grantees to secure their data when being funded by NIH grants. NIH Funded programs require that the results be made to the research community and to the public.(4) It provides that some information can be made available through a 'on request' only forum.(5) Most information that is generated from a NIH funded grant is considered to be public in order to foster an environment of collaboration and accountability. This isformation is published at the NIH website located at If information is submitted to the NIH that is considered to be proprietary can be kept confidential but it is discouraged by NIH.

NIH Reporting Polciy
In regards to specific research data the NIH handles all request for release of the specified data through the FOIA request. According to NIH Grants Policy statement allows that only final data be published and allows the following information to be held in confidence: preliminary analyses; drafts of scientific papers; plans for future research; peer reviews; communications with colleagues; physical objects (e.g., laboratory samples, audio or video tapes); trade secrets; commercial information; materials necessary to be held confidential by a researcher until publication in a peer-reviewed journal; information that is protected under the law (e.g., intellectual property); personnel and medical files and similar files, the disclosure of which would constitute an unwarranted invasion of personal privacy; or information that could be used to identify a particular person in a research study.(5)

The release of the data is governed under the NIH GPS 8.2.1 regulation which provides that hte grantee own the rights of the data and can be extended to gropus of people whose primary purpose is education without NIH approval.(6) When the research is published or declared through a media release the NIH grant must be recognized through the specific program number with the caveat that releases any assumed NIH endorsement of the findings.(6) If there are any publications the url or the identifiying document nubmer must be provided in the final report. The published results will also be housed at PubMed as an electronic version of the final, peer reviewed manuascript(NIH GPS 8.2.2) within a year of publication.


-Curtis Kleier
[Full marks JCB]

4. By which criteria can data be considered "Open Data"?

Generally speaking open data is data freely available (in the public domain) that is not hindered by copyright or intellectual property laws. "Open Data" should fit the following criterion: easy to access (including knowledge of its existence), ability to a access a description of how it was obtained (instrumental logs etc.), ability to contact the data publisher, and finally the ability of the data to be shared.


Data can be identified as Open Data if:

It can be accessed for a reasonable fee ( or free), it can be distributed freely, it can be accessed easily (common format i.e. a downloadable pdf or mp3), and it can be freely modified.

Additionally according to Open Data Commons; open data can be "licensed." One would recognize by the terms of this license that the data is "open".

Open data has certain visual/ pictoral labels that allow the viewer to know that it is open data. Often open data will use the cc0 or No Rights Reseraved license often found at the bottom of a website or document . However not all creative commons licenses are "open" only two are. The following figure explains.
external image Open_v_Public_licenses_Venn.002-001.png


[Full marks JCB]

5. List at least 4 reasons why it benefits a chemist to publish their work in peer reviewed journals.

1. It helps increase the scientific quality of their article. The more people who read it, the less errors that slip through into the published article. Peer reviewers can help remove inaccurate data, and help point out missing references. Also, it encourages scientist to hold the experimetal techniques to a higher standard, because they know the article is going to be read by people who understand the experiment before it reaches publication.

2. It gives the article validation. It puts a seal of approval which shows that person who did the experiment followed legitimate protocols which met certain standards. It allows people not in that area of expertise to put more trust in the findings within the article.

3. It helps filter out articles based on an improper experiment. It can help the scientist realize that their data might not be true, but actually based on a side reaction, or a contamination in a solution. It also saves someone from publishing a duplicate paper, one that has already been written. This saves the scientist and the publisher from defamation and embarrassment. Publishing papers like this can cost a scientists their employment, funding, or awards.

4.Peer Review also makes a paper easier to understand. Having someone else read the article allows a new perspective that can point out areas where the author made a jump that seemed logical to them who study that area, but that jump might not be apparent to people without that expertise.

[Full marks]

6. List 5 sources for finding spectra used to characterize chemical compounds.

NIST Chemistry WebBook - This source provides IR, THz IR, Mass, and UV-Vis spectra, tabulations of data from GC spectra, and serveral physical, chemical, and thermodynamic properties of chemicals. The service allows a user to search for information by inputting chemical formulas, chemical names, IUPAC identifiers, CAS registry numbers, reactions, chemical structures, and authors of publications in the NIST database.

Spectral Database for Organic Compounds - This source is a database for organic compounds that provides MS, 13C NMR, 1H NMR, IR, Raman, and ESR spectra. The source allows a user to search for a chemical by its compound name, molecular formula, molecular weight, CAS registry number, and SDBS number.

Sigma-Aldrich - This source provides FT-NMR, FT-IR Raman, and FT-IR condensed phase spectra for certain chemical products. The source also provides MSDS sheets for chemicals and allows a user to search for a chemical by inputting a chemical name or by drawing a chemical structure in a java application.

NMRShiftDB - This source is a database for organic compounds that provides 13C NMR, 1H NMR, and other types of NMR spectra. The database also allows for NMR spectrum prediction for a file containing a chemical structure on a user's computer. The database is open source and features peer-reviewed submission of sets of data. The database allows a user to search for a chemical by inputting the chemical name, the chemical structure, and various other properties and also allows a user to search for a specific chemical spectrum.

ChemSpider - This source is a database for all types of chemicals (organic, inorganic, biomolecules etc...) that provides NMR, IR, UV-Vis, and other experimental spectra for certain chemicals. The service also provides experimental and predicted physical and chemical properties. The database allows users to search for a chemical by inputting the chemical name, trade name, registry number, SMILES, InChI, CSID, and by drawing the chemical structure. Users can also upload spectra for specific molecules.

-Paul DeGregory
[Full marks JCB]

7. What initiatives exist to provide authors with unambiguous identifiers and (briefly) why is this both important and difficult?

There are two popular initiatives to identify researcher: the International Standard Name Identifier (ISNI) and the Open Researcher & Contributor ID (ORCID). In 2010, the ISNIs, consisting of 16 digits, were issued for researchers or legal organizations by the International Organization for Standardization (ISO). In November 2009, the ORCID was released by Nature Publishing Group and Thomson Reuters and was used as a popular identifier for many researcher and prestigious organization such as CrossRef. Another initiatives is ResearcherID that is also from Thomson Reuters in January 2008, but ORCID and ResearcherID are programmed and developed from different computer systems and servers.
Usually, each researcher who received identifier can access to his/her online profile to edit and control private information. Online profile may include a list of publications, patents, grants, and other information, such as past institution history, blog.
The researchers also got their ID number by setting up profile pages on databases such as COS Expertise system provided by ProQuest. However, these profiles are only assessed by institutions and subscribers.

Important factors:
First of all, the initiatives are important because they are international system that will be recognized by all researchers, public viewers, and organizations around the world.
Secondly, name is an insufficient identifier. An initiative could serve as a sufficient tool to distinguish different researchers sharing the same name. In some cultures, a woman who changed her last name after marriage can be identified accurately by initiative. In some cases, it also helps to identify a researcher with misspelling name, such as J. Doolittle instead of James Doolittle. In linguistic aspect, a researcher with two different names in English and Chinese is easily recognized by an alphanumeric ORCID or ISNI.
Initiatives can be used as alternatives for existing IDs that should be considered as private information such as Social Security, driver’s license, and passport numbers.
A “unique author identifier” also helps connecting an individual’s publication to other data on the Internet so that it is helpful in funding reviews, searches for potential collaborators or competitor, and citation analyses.

>Difficult factors:
The first difficulty in establishing those ID systems is that there is currently “no authoritative list of all the researchers in the U.S. with all of their publications, grants, and other achievements”. The only way that an identifier can be issued is that a researcher or organization has to apply to those systems.
The second challenge is that those systems have to be last for hundreds of years in order to accomplish a complete collection of all researchers’, groups’ identifiers. The next challenge is that they have to be international systems because delocalization of researchers around the world is normal nowadays. That is why “what is really needed is a global, cross-sector, cross-institutional system that research institutions and all types of publishers can share.”

-Hai Truong
[Full Marks JCB]

8. Give 4 examples of Open Data initiatives undertaken by commercial entities and briefly describe the rationale behind these initiatives.

[None of these are Open Data except for the Ordnace Survey JCB]

1. Space-Time Research – This company’s sole goal is to provide the results of research projects they have conducted to their customers. This will allow an ease of access to data currently hard to find elsewhere. The information researched varies widely in subject, from health topics to education matters. Space-Time Research realizes the growing need for easy access to information and records that the public already has a right to see.

2. Microsoft – With the use of Windows Azure, Microsoft has the largest growing open data initiative and also the most helpful. They have now partnered with the government so that the Open Government Data Initiative (OGDI) can run efficiently and effectively.

3. Ordnance Survey – This company has just recently launched their OS OpenData, an initiative to allow people access to maps and area information for thousands of locations. Allowing people the easy and convenient digital access to these sources promotes many things like education, safety, and science.

4. SAP BusinessObjects – A global organization that has been longstanding in the open data initiative. Providing access to information across the world, ultimately connecting people and businesses is their only intention. They have been successfully running and continuously updating for years.

5. Jigsaw – This website,, not only allows you to find open data about millions of companies, but also the contact information for the company as well. The purpose for creating this database is to allow companies to do the most important promotion: networking.

Sigrid Williamson

9. What is the h-index? How is it used?

The h-index is a method of quantifying the impact and relevance of a scientist's research. The physicist Jorge E. Hirsch developed the h-index in 2005 as an efficient way to evaluate and compare researcher's productivity, for the purposes of faculty recruiting and the rewarding of grants, for example. Hirsch defined the index h as the number of papers with citation number greater than or equal to h. This means that a scientist with an index h has published h papers, which have each been cited at least h times. This takes into account both the number of publications and number of citations per publication. It should be noted, however, that the comparisons of h-indices may only be applicable to scientists working in the same field, since citation conversions may differ from field to field. There are several criticisms of the h-index, the most prevalent of which states that the h-index is misleading because it does not take into account the number of authors of a paper and is also bounded by the total number of publications. This undermines the quality of the work of scientists who have made fewer but very important discoveries.

- Diane Liu
[Full marks]

10. What is a style guide? Is there a style guide for chemistry research articles?

A style guide is a way to improve a specific work by invoking standards in style and language. They contain specific formats and preferred ways of wording statements so they sound clear and precise. There are several style guides for various works. One style guide provided from Massachusetts institute from technology has a format for writing research papers. Organizations and institutions such as Oxford University, Oregon State Chemistry department, American Chemical Society and Deakin University have style guides for writing reports in chemistry. Apparently there is not a direct universal way of writing a research paper. It would appear that the standards for writing one are different for the organization that the research paper is being submitted to, however there are several concepts that overlap within methods of writing one. Structure of the paper is universal with the title, abstract, introduction, methods, results and discussion being essential. Each of those sections must contain certain content in order to be accepted by the scientific community.
-Keith DeNivo
[Full Marks}

11. Compare in detail the merits of Scifinder, Web of Science and Google Scholar.

Google Scholar:
  • In Google Scholar you can find academic materials such as peer-reviewed papers, theses, books, abstracts, and technical reports from various areas of research.
  • It is a reliable search tool to browse and access some of the academic literature. It is a free and openly accessible tool for academic literature.
  • Many articles are available full-text.
  • Google Scholar is able to connect to library resources.
  • The sources are more academic than those you would find through a standard search engine query.
  • It has a wide range of academic content areas and is updated constantly.
  • It ranks and lists results according to how relevant they are to the search query. The most relevant references tend to appear at the top of the page.
  • You can access articles the library has on campus or off through the use of a VPN.
Web of Science:
  • It includes only the most influential, relevant and credible journal information available.
  • It has a diverse multidisciplinary coverage (approximately 8,830 titles indexed from 230 disciplines), which facilitates discovery by allowing the user to find answers in places that could have been overlooked.
  • It covers many journals back to 1900, including issues of significant publications.
  • Every item within a journal is indexed, including full papers, editorials, reviews, letters, and more.
  • Its backfile collection makes it over 100 years of valuable research available. This unique collection is also cross searchable with other ISI Web of Knowledge resources.
  • It delivers cited reference data that is accurate, consistent and without redundancies.
  • It has an analyze tool that allows the user to discover emerging areas of current investigation in the sciences and social sciences. This tools lets the user refine searches by grouping results by author, publication year, institution, subject category, document type, source title, language, or country.
  • The major resource to find chemistry articles and patents from around the world as well as reputable sources.
  • Reference to more than 10,000 currently published journals and patents from more than 61 patent authorities.
  • Important scientific discoveries from the present to the mid-1800s.
  • The latest scientific breakthroughs almost as soon as they are published with references added daily and some patent information as recent as 2 days ago
  • The world’s largest collection of organic and inorganic substance information.
  • Access to current, high-quality scientific information
  • Links to more relevant journal articles and patent documents than any other source.
  • Content indexed by scientists.
  • An intuitive interaction
  • Time savings, with speedy access to more than a century of scientific information
  • A novel approach to problem solving by linking related concepts.
  • An online version of Chemical Abstracts
  • A keyword, references, and abstracts database (CAPLUS)
  • A substance database (REGISTRY)
  • A reaction databases (CASREACT)


-Marcela Garcia
[Full marks JCB]

12. How is InChI used in chemistry and what are some advantages and disadvantages.

(Hint make use of this presentation)

  • The IUPAC International Chemical Identifier (InChI) is a code representation to indentify chemical substances. It offers a standard way to encode molecular information and it enables the search for electronic chemical data source.
  • InChI illustrates chemical substances in layers of information. Every InChI starts with "InChI=" followed by the version number, currently 1. Then followed by the six important layers:
  • Main layer provides chemical formula (no prefix), atom connections (prefix: "c"), hydrogen atoms (prefix: "h")
  • Charge layer, with p for + charge and q for – charge
  • Stereochemical layer
  • Isotopic layer
  • Fixed-H layer
  • Reconnected layer.
  • The layers and sub layers are seperated by "/"
  • InChIKey is a new format directly derived from InChI
external image 620px-L-Ascorbic_acid.svg.png

L-ascorbic acid



Advantages of InChI:
  • People can use it freely and it’s non-proprietary
  • It provides better chemical presentation than other codes such as SMILES
  • It is unique because one InChI represents one chemical structure.
  • It can be computed from structural information and do not have to be allocated by some organization
  • It is accurately indexed by the web search engines, thanks to InChIKey
Disadvantages of InChI
It can be hard for human to read or figure out the chemical structures from InChi. And it mostly represents specific chemical structures and compounds and so it can not be applied to generic formats in patent literature such as Markush structures. As a result, it can't be used to retrieve patent literature.

World Patent Information, Volume 31, Issue 4, December 2009, Pages 278-284.
16 May 2005: International chemical identifier goes online
The IUPAC InChI project (Heller)
-Chi Nguyen
[Full Marks JCB]

13. What is a Markush structure and how is it used? Provide a specific example.

A Markush structure is a general description of a related set of chemical compounds. Markush structures are used specifically in patents.
Dr. Eugene Markush was a dye manufacturer who founded the Pharma Chemical Corporation in 1917. In 1924, Markush was received a patent for pyrazolone dyes, USP # 1,506,316. This patent was groundbreaking because Markush not only claimed several chemical structures which had been synthesized in the lab, he also claimed in the patent a general chemical structure for pyrazolone dyes. Markush was the first person to claim general chemical structures in a patent, thus structures of this type became known as "Markush structures" after the US Patent Office ruled to permit such structures in 1925.
A specific example of a Markush structure, represented by a chemical drawing, is a bond drawn to the center of a ring, indicating that the ring can be substituted at any position. Several other specific examples can be seen in the image below, from an article Comparison of Markush Structure Databases from the Journal of Chemical Information and Modeling:
Image reference: J. Chem. Inf. Comput. Sci. 1993, 33, 799-804.

-M Livings
[Full Marks JCB]

14. List and describe three analytical techniques to characterize the organic compounds.

1. Mass Spectroscopy:
Mass spectrum shows as a vertical line graph, and each line represents an ion (or a fragment) which has a specific mass to charge ratio (m/z) and the height of each peak means the abundance of the ion. Generally the highest peak called base peak, and the peak has largest m/z ratio usually gives the total molecular weight of a compound. Difference between m/z ratios of two peaks refers to molecular weight of the fragment of this compound.
2. Infrared:
IR spectra indicate the presence of a band at a characteristic location (frequently is used to determine functional groups), usually for an IR adsorption or give a band there must be a changing in dipole moment. The intensity (weak, medium or strong), shape (broad or sharp) and position ((cm-1) are very important source for identify the functional groups in the spectra.
3. Nuclear magnetic resonance:
NMR Spectroscopy generally has proton NMR spectrum and carbon-13 NMR spectrum. Proton NMR shows different types of signals, what type of proton, number of proton in each type and connectivity of each group due to coupling neighboring groups that a compound contains. For Carbon-13 NMR, we can see the number of peaks which indicates the number of types of carbon and chemical
shift of each signal represents types of carbon of a compound but usually we can't see the carbon-carbon coupling in carbon 13 spectra.

Example: Acetic Acid (CH3COOH):

[Full marks JCB]

15. List and describe 3 chemical information resources that make use of mobile devices (smartphones)

This application used to search citations for biomedical and life science journal. features are key search with options, save search query and citations, email citations, view abstracts, link to full article if available, it allows remote access through smart phone if VPN (virtual private Network) available.

This application helps to find content easily without using internet. It does not require any registration on website or download any software. This Application helps to find information on patient symptoms quickly, it has other features like we can email, bookmark, automatically records history of topic last viewed.

WISER (Wireless Information System for Emergency Responders) Link
This application is designed to assist first responders in hazardous material incidents. Supports with decision including proper guidelines that avoids accidents, save life and environment. Allow to access NLM's Hazardous Substances Data Bank (HSDB), Radiological and biological substance support and provide first responders critical information.

~Arti Patel
[Full marks JCB]

16. What is web 3.0 and how can it be used in chemical publishing?

Web 3.0 is a also know as The Semantic Web, and is online information that is represented in a way so that is able to be read, interpreted, and acted upon by machines. Web 1.0 and 2.0 needs human direction to accomplish this. The idea for The Semantic Web came from Time Berners-Lee, the inventor of the WWW, URLs, HTTP, and HTML.

The semantic web is built off the syntax use of URIs (Uniform Resource Indentifier). A triple, three URIs, are put together to form a RDF (resource description framework). Majority of data on the web is often hidden away in HTML files, which make this data difficult to use on a large scale. Web 3.0 would be the first global system for publishing data in such a way that it can be easily processed by anyone.

If computers are able to read sematic web information, this would allow a "proof" mechanism to be implemented, that could quickly scan and detect obvious errors in chemical data. The semantic web would also broaden the search field, and the computers would be able to help human researchers find the information that they need.The semantic web could also enhance chemical publishing by automating real-time publishing and sharing of experimental data on the Internet.

Egon Willighagen, a Post-Doc in Sweden, created a blog that discusses RDF for chemistry, chem-bla-ics ( He links to another blog,, that discusses linking chemical information into one unique indentifier. SMILES, InChI, and other codes unique to a company, can be replaced with an RDF. The RDF could contain InChI information on molecular structure, as well as chemical property information. This could be used to collect chemical catalogue information from many different sources into one RDF code, that will be easier and faster for researches to access and understand.

-Danielle Fagnani
[Full marks JCB]

17. List and briefly describe 5 sources that allow structure searches for organic compounds?

1. SciFinder
Start by clicking on explore substances tab on upper portion of website after login. Scifinder allows exact or substructure searches, the ability to sort out different conditions (i.e. solvent), literature that provides in detail reaction conditions, provides yields from previous research, check commercial availability, and the type of studies that the user wishes to focus on, such as analytical or biological studies. Scifinder contains numerous refinement tools to aid the user to specify as needed. Site requires a subscription for use.

2. Beilstein Crossfire
Beilstein is similar to Scifinder that it provides very detailed descriptions of the organic compound that is drawn. Both exact and substructure searches are allowed. Can also refine searches based on the molecular weight of the compound as well as other properties. Site requires a subscription for use.

3. Organic Syntheses
Site requires ChemDraw plugin. The Organic Syntheses website will also allow a search through the journal volume, author name, substructure, reactions involving the organic compound drawn, etc. Does not require a subscription.

4. Chem Spider
Chem Spider is a useful resource to not only draw the organic compound, but provides a skeleton search as an option. Users can refine search by properties (including rule of 5, MW, refractive index, boiling point, melting point, density, flash point, etc) as well as the data source and type. Does not require a subscription.

5. Sigma-Aldrich
Commercial chemical provider that allows the user to check the commercial availability of the compound by Sigma Aldrich. Site allows an exact or substructure search, as well as additional criteria (i.e. boiling point, melting point, molecular weight) to refine searches. Also useful for quickly identifying basic properties of compounds that are commercially available through the search tool. Does not require a subscription.

-Arben Kojtari
[Full Marks JCB]

18. List and Describe 5 Databases for Biomolecule Information.

  1. Protein Data Bank : PDB is a free archive that provides users with information relating to structures of protein, nucleic acid, and other large biological molecules. The PDB also annotates all data, allowing for annotation based searches.
  2. Entrez Protein Clusters (ProtClustDB) : ProtClustDB provides users with access to publications, structures, annotation information, domains, and external links regarding related protein sequences.
  3. Peptidome : Peptidome is a free database that contains peptide and protein mass spectrometry data.
  4. PopSet : PopSet is a database containing related nucleotide sequences. It is used to study the evolutionary similarities of a population.
  5. UniProt : This is a free and comprehensive database of protein structure and function.
-Byron Forte
[Full marks JCB]

19. Question: What is EndNote and how do I get this at Drexel?

EndNote is a commercial reference management software package by Thomson Reuters. It can be used to search for literature, develop and manage a personal library of references, and create and format citations when writing articles for publication. The software is available for download to all Drexel students.

How to download the software

-Go to
-Click on
-Log in using your Drexel username and password
-Click on 'students' link
-Download the EndNote software (PC software or Macintosh software)

Here are the main features of EndNote software
  • Ability to search bibliographic databases on the Internet
    • More than 3,900 files can be found.
    • EndNote's 'Search' function allows to search for different Internet databases, including Web of Science, PubMed, and the Library of Congress.
    • The references of interest can be directly exported from the Internet databases.
  • Organize references, PDFs, and any other files in a custom library
    • An unlimited number of libraries and of any size can be created.
    • Subsets or custom group references can be created for better maintenance and easier organization.
    • Full text articles can be located and downloaded automatically.
    • PDFs and other files can also be stored within the EndNote reference library.
    • The settings on the EndNote library display, such as bibliographic preferences and other options, can be easily organized from the 'Preferences' link.
    • EndNote includes more than 4,500 predefined bibliographical styles. Each style can be modified or new styles can be created.
  • Create instant bibliographies in Microsoft Word
    • As citations are inserted in your manuscript, a bibliography is automatically created.
    • References in the bibliography section are automatically updated as citation changes in the word document are made.
    • References can also be directly transferred from colleagues' papers in your EndNote library with the export traveling library feature.

For more information on EndNote software and tutorials, see
[Full Marks JCB]

20. What are the common temperature units and how were they developed?

Temperatture is the quantitative measure of the amount of thermal energy that matter contains. Scales of temperature are relative, in that certain standard reference points are used to set the scale and degree divisions.
There are three temperature scales used widely today, namely, the Celsius scale (degrees C), the Fahrenheit scale (degrees F), and the Kelvin scale (K).The most commong of which, for every day use, is the Celsius scale.
Celsius Scale
Anders Celsius developed this scale in 1742 by defining two easily calibrated reference points. The two standard points that define the Celsius scale are: the melitng point of ice at 1 atm is set to 0degrees C, and the boiling point of water is set to 100degrees C. This means that there are 100 equal divisions of temperature between melting of boiling of water. This scale is then extended above and below these points in equal divisions.
Fahrenheit Scale
Daniel Fahrenheit developed this scale in 1724. In the Fahrenheit scale, the difference in temperature between the melting and boiling of water is set to equal divisions (degrees F) with the melting point set to 32F and the boiling point set to 212F.
Kelvin Scale
Lord WIlliam Kelvin developed this scale in 1854. In this scale the zero of temperature is defined as that in which molecular kinetic energy is at the minimum (i.e. molecular motion at absolute minimum and entropy at minimum). This "absolute zero" is a theoretical temperature derived from the behavior of ideal gases with decreasing pressure. It was observed that under constant volume and constant mass, the pressure of an ideal gas decreases with decreasing temperature at a constant rate. Using this relationship, it was found that at -273.15C the pressure of the ideal gas system would theoretically became zero. This way, the absolute zero of the temperature scale was defined and reset to 0K=-273.15C. Therefore, the Kelvin scale uses the same equal division (degrees)as the Celsius scale, except that it shifts the zero of temperature to absolute zero. This means that delta(T(K))=delta(T(C)). The second reference point for the Kelvin scale is the triple point of water which is measured at 273.16K.
Wikipedia, Units of Measurement
Wikipedia, Temperature Vision Learning Article
[Full Marks JCB]

21. Discuss the differences between traditional peer review and 2.0 peer review and which is the most preferable option.

  • Peer review in general is the assessment of your work by individual in an equal or similar field as your own. The traditional style of peer review has been the way in which scientific data and writings have been reviewed for many years. At the same time, traditional peer review can become a lengthy process that may take many months when being reviewed by experts. Also, in many cases, the work is reviewed in hasty manner with smaller details being skipped. Another downside is that once a paper is published, after traditional peer review, it is very difficult to update this work, and most of the time, it becomes a citing to a new paper making a small change. The reason why traditional peer review is still followed is because they are the norm of what is accepted for tenure and promotion.
  • In 2.0 peer review, the world would get reviewed using multiple forms of online media sharing and social networking sites.One of the downfalls of 2.0 is that it is almost impossible to determine exactly who is doing the reviewing and what their qualifications are. Also, the rate at which the reviewing occurs is highly variable. While some information may get reviewed almost immediately, other may never get any kind of reviewing or criticism. There is no type of 2.0 peer review that is taken into account for promotion or tenure consideration which is a big deterring factor in a mass transformation to this type of peer review. As a plus, papers which are constantly being peer reviewed in a 2.0 setting can be constantly updated and changed.
  • So while it is clear that 2.0 peer review is the right choice for the fast pace chemistry field, there are still many changes that need to be made in order to make the process more acceptable in the scientific field such as open identity of reviewers and time frames of time for papers to get reviewed.
Elizabeth Brown Presentation
[Full Marks JCB]