Research Interests: Nonstationary Time Series Analysis; Spectral Analysis; Nonparametric Statistics; Big Data; Bayesian Data Analysis; Applications in Medicine / en New scalable computing technique will make analyzing Big Data easier  /news/2024-09/new-scalable-computing-technique-will-make-analyzing-big-data-easier <span>New scalable computing technique will make analyzing Big Data easier </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Tue, 09/17/2024 - 16:23</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/lwang41" hreflang="en">Lily Wang</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">With the advancement of data collection techniques, there has been an exponential increase in the availability and complexity of datasets, particularly spatiotemporal data; finding the computing power to analyze such Big Data, however, has remained a challenge for many researchers in various fields. Through a collaborative research project funded by the National Science Foundation, AV statistics professor <a href="/profiles/lwang41">Lily Wang</a> hopes to change that.  </span></p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2024-09/lily_wang_500x500.png?itok=LdCm02CH" width="350" height="350" alt="Lily Wang, Professor, Statistics, College of Engineering and Computing. Photo by Creative Services" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Professor Lily Wang, Department of Statistics, College of Engineering and Computing. Photo by Creative Services</figcaption></figure><p>Wang and the Chair of the Department of Statistics at The George Washington University, <a href="https://statistics.columbian.gwu.edu/huixia-wang">Huixia Judy Wang</a>, are developing a form of scalable, distributed computing that could lessen the power demand on any single computer by distributing the analysis across a network of computers.  </p> <p>“In the past, we knew there were insights hidden in the data, but due to computing limitations, we couldn’t access them,” said Lily Wang. “Now, with scalable quantile learning techniques, we can gain a deeper understanding of the entire data distribution and extract insights into variability, outliers, and tail behavior, which are critical for more informed decision-making.” </p> <p>Spatial and temporal data are increasingly being used in such research areas as climate study and health care, among others, noted Lily Wang. </p> <p>“This data richness presents a lot of opportunities for getting deep insights into dynamic patterns over time and space; but it also brings many, many challenges,” said Wang. Large datasets often exhibit heterogeneous and dynamic patterns, requiring new approaches to capture meaningful relationships. </p> <p>This project uses two large datasets: the National Environmental Public Health Tracking Network database from the Centers for Disease Control and Prevention and the outdoor air quality data repository from the Environmental Protection Agency. </p> <p>“Both datasets have been challenging to analyze in the past due to their size and complexity,” explained Wang. “But through scalable and distributed learning techniques, we’re now able to handle large-scale heterogeneous data across the entire United States.” </p> <p>One of the project's major innovations is the use of distributed computing to divide the data into smaller, manageable regions. Each region is analyzed separately, and the results are efficiently aggregated to form a comprehensive understanding of the entire dataset.  </p> <p>“You can think of it like dividing the U.S. into small regions, analyzing each one separately, and then combining the results to create a comprehensive national analysis,” Wang said. “This method allows us to analyze millions of data points simultaneously without the need for supercomputers.” </p> <p>Beyond its goals for technical advancements, the project also emphasizes training the next generation of data scientists. Graduate students at George Mason and The George Washington will gain hands-on experience working with real-world data, helping to develop new computational methods.  </p> <p>The project began on September 1, 2024, and is expected to last three years. It has already garnered attention, including recognition from the office of Congressman Gerry Connolly (D-VA). </p> <p>The potential applications of this research are far-reaching, from improving air quality predictions to understanding public health trends and beyond. Wang explained, "This work empowers researchers and policymakers to leverage vast amounts of data to address rising societal issues more effectively.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/7351" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7631" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/8301" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/20306" hreflang="en">Research Interests: Nonstationary Time Series Analysis; Spectral Analysis; Nonparametric Statistics; Big Data; Bayesian Data Analysis; Applications in Medicine</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Tue, 17 Sep 2024 20:23:22 +0000 Teresa Donnellan 113926 at