Scientific studies often examine the relationships between place-based information and health outcomes; for example, air pollution and asthma, neighborhood crime and mental health, or community greenspace and IQ. Study subjects with location information, most commonly a residential mailing address, are linked to databases of place-based information, or “geomarkers”, in order to conduct these studies. Geocoding is the process of translating a string of text referring to a location (e.g., mailing address) into coordinates on the earth’s surface (e.g., latitude and longitude). These coordinates are required to link participants to their estimated exposures to geomarkers – a process we call “geomarker assessment”. Some examples of geomarker assessment commonly performed in health studies using people include distance to the nearest major roadway – a commonly used as a measure of estimated exposure to traffic related air pollution that is associated with increased risk of asthma – or neighborhood median household income – a commonly used as a measure of community deprivation associated with increased bed days spent in the hospital.

Within these health studies, both geocoding and geomarker assessment involve the use of personal identifiers (addresses or geocodes) and must be conducted in a HIPAA and IRB compliant manner. These laws were designed to protect the privacy of study participants by preventing the sharing of protected health information (PHI). While beneficial with respect to privacy, this prevents an outstanding challenge for researchers by preventing them from using external third party software, e.g. Google Maps, to analyze and extract information from study participants’ addresses or locations. Furthermore, this restricts scientists’ ability to collaborate by combining datasets containing any PHI.

Our solution is a standalone, container-based application that can produce geocodes and conduct geomarker assessment. A container is a platform that wraps software into a complete filesystem containing everything it needs to run. For geocoding and geomarker assessment, this includes code, software libraries, and geospatial data. Usable on PC, Mac, or Linux machines, researchers can use DeGAUSS containers to geocode and conduct geomarker assessment without PHI leaving their local machine. After geomarkers are attached to subjects’ health information, personal identifiers like address or location coordinates are removed, effectively making the information no longer PHI. This approach can facilitate sharing and collaboration among scientists studying the health effects of geomarkers. In addition, the use of containers guarantees the software will always run the same, regardless of its environment, which is a vital requirement for reproducible research.