Computational statistics / en New scalable computing technique will make analyzing Big Data easier  /news/2024-09/new-scalable-computing-technique-will-make-analyzing-big-data-easier <span>New scalable computing technique will make analyzing Big Data easier </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Tue, 09/17/2024 - 16:23</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/lwang41" hreflang="en">Lily Wang</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">With the advancement of data collection techniques, there has been an exponential increase in the availability and complexity of datasets, particularly spatiotemporal data; finding the computing power to analyze such Big Data, however, has remained a challenge for many researchers in various fields. Through a collaborative research project funded by the National Science Foundation, AV statistics professor <a href="/profiles/lwang41">Lily Wang</a> hopes to change that.  </span></p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2024-09/lily_wang_500x500.png?itok=LdCm02CH" width="350" height="350" alt="Lily Wang, Professor, Statistics, College of Engineering and Computing. Photo by Creative Services" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Professor Lily Wang, Department of Statistics, College of Engineering and Computing. Photo by Creative Services</figcaption></figure><p>Wang and the Chair of the Department of Statistics at The George Washington University, <a href="https://statistics.columbian.gwu.edu/huixia-wang">Huixia Judy Wang</a>, are developing a form of scalable, distributed computing that could lessen the power demand on any single computer by distributing the analysis across a network of computers.  </p> <p>“In the past, we knew there were insights hidden in the data, but due to computing limitations, we couldn’t access them,” said Lily Wang. “Now, with scalable quantile learning techniques, we can gain a deeper understanding of the entire data distribution and extract insights into variability, outliers, and tail behavior, which are critical for more informed decision-making.” </p> <p>Spatial and temporal data are increasingly being used in such research areas as climate study and health care, among others, noted Lily Wang. </p> <p>“This data richness presents a lot of opportunities for getting deep insights into dynamic patterns over time and space; but it also brings many, many challenges,” said Wang. Large datasets often exhibit heterogeneous and dynamic patterns, requiring new approaches to capture meaningful relationships. </p> <p>This project uses two large datasets: the National Environmental Public Health Tracking Network database from the Centers for Disease Control and Prevention and the outdoor air quality data repository from the Environmental Protection Agency. </p> <p>“Both datasets have been challenging to analyze in the past due to their size and complexity,” explained Wang. “But through scalable and distributed learning techniques, we’re now able to handle large-scale heterogeneous data across the entire United States.” </p> <p>One of the project's major innovations is the use of distributed computing to divide the data into smaller, manageable regions. Each region is analyzed separately, and the results are efficiently aggregated to form a comprehensive understanding of the entire dataset.  </p> <p>“You can think of it like dividing the U.S. into small regions, analyzing each one separately, and then combining the results to create a comprehensive national analysis,” Wang said. “This method allows us to analyze millions of data points simultaneously without the need for supercomputers.” </p> <p>Beyond its goals for technical advancements, the project also emphasizes training the next generation of data scientists. Graduate students at George Mason and The George Washington will gain hands-on experience working with real-world data, helping to develop new computational methods.  </p> <p>The project began on September 1, 2024, and is expected to last three years. It has already garnered attention, including recognition from the office of Congressman Gerry Connolly (D-VA). </p> <p>The potential applications of this research are far-reaching, from improving air quality predictions to understanding public health trends and beyond. Wang explained, "This work empowers researchers and policymakers to leverage vast amounts of data to address rising societal issues more effectively.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/7351" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7631" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/8301" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/20306" hreflang="en">Research Interests: Nonstationary Time Series Analysis; Spectral Analysis; Nonparametric Statistics; Big Data; Bayesian Data Analysis; Applications in Medicine</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Tue, 17 Sep 2024 20:23:22 +0000 Teresa Donnellan 113926 at Professor applies statistics and AI to land use modeling and real estate pricing  /news/2024-05/professor-applies-statistics-and-ai-land-use-modeling-and-real-estate-pricing <span>Professor applies statistics and AI to land use modeling and real estate pricing </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Wed, 05/29/2024 - 12:18</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/asafikha" hreflang="en">Abolfazl Safikhani</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">AV statistics professor Abolfazl Safikhani recently applied his cutting-edge, interdisciplinary research to analyzing land use dynamics and property pricing shifts over time, work that underscores the transformative potential of data-driven insights, especially in urban planning and real estate. </span></p> <p>Safikhani earned bachelor’s and master’s degrees in mathematics before earning a doctorate in statistics. </p> <p>“I decided to do a PhD in statistics because throughout the master’s I had become more and more interested in connecting real world problems to data. And I'm very happy that I made that decision,” he said. </p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2024-05/resize_image_project-1.png?itok=YbD3pYgn" width="350" height="350" alt="Abolfazl Safikhani" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Abolfazl Safikhani</figcaption></figure><p>Along with a former colleague at the University of Florida in the urban planning department, Safikhani applied machine learning techniques to a dataset comprising millions of land parcels in Florida. The two endeavored to decipher the intricate dynamics of land use transformations over time and predict future developments with unprecedented accuracy. Their predictions surpassed 98% accuracy. </p> <p>But the team didn't stop with successful predictions. They recognized the importance of understanding the underlying mechanisms driving these predictions. With the addition of a new collaborator, Tianshu Feng in George Mason’s Systems Engineering and Operations Research Department, the researchers aim to present their land use analysis software as explainable artificial intelligence (XAI). By elucidating the black box of machine learning algorithms, Safikhani hopes local government decision-makers and urban planners can confidently leverage the software to optimize resource allocation effectively. </p> <p>Another of Safikhani’s projects considers land use and value specifically concerning the price of residential real estate. Safikhani’s own experience buying real estate in Fairfax County, Virginia, in 2022, inspired this project. When he asked his real estate agent to estimate a fair price of a certain house, the agent came back with an estimate based on the price of three comparable local properties that had recently sold. Ever a “quant guy,” Safikhani said, he thought there could be a better way: applying the idea of transfer learning. </p> <p>“The big idea of transfer learning is, within your big data set, try to find areas that have similar dynamics to your area of interest. And then use that similarity to improve your prediction,” Safikhani explained. “So, imagine that there is a little neighborhood somewhere in DC or somewhere in Maryland or somewhere in California that has dynamics very similar to the specific neighborhood where you want to buy a house in Northern Virginia. Once you account for some changes, let's say, regulations and things that are different, then the remaining dynamics are their similarities.” </p> <p>He continued, “If you only use your neighborhood, you can have three data points. If you use another, similar neighborhood, it's going to be 20. If you use neighborhoods from other places over the 50 states of the U.S., you may end up getting a thousand data points.” </p> <p>Safikhani is working with a colleague from the University of California – Los Angeles to bring in funding to develop this pricing software. Their preliminary results show the benefit of their proposed model versus current pricing systems.  </p> <p>Safikhani's research is poised to revolutionize sectors like urban planning and real estate. In fact, his research has attracted the attention of startups keen to translate his findings into real estate–disrupting tools. </p> <p>“It seems there's actually a growing interest in having such AI tools that would understand land use development and then really match it with pricing,” he said. “And sooner or later, this [technology] is going to come out. Platforms like Zillow are doing a good job, but there's much more that can be done.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/9211" hreflang="en">Applied Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7351" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7631" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/8301" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/6906" hreflang="en">real estate entrepreneurship</a></div> <div class="field__item"><a href="/taxonomy/term/4656" hreflang="en">Artificial Intelligence</a></div> <div class="field__item"><a href="/taxonomy/term/4666" hreflang="en">AI</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Wed, 29 May 2024 16:18:12 +0000 Teresa Donnellan 112346 at David Kepplinger /profiles/dkepplin <span>David Kepplinger</span> <span><span lang="" about="/user/326" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Martha Bushong</span></span> <span>Fri, 09/04/2020 - 10:04</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:profile:field_headshot" class="block block-layout-builder block-field-blocknodeprofilefield-headshot"> <div class="field field--name-field-headshot field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/2023-08/David%20Kepplinger.jpg" width="500" height="500" alt="Mason assistant professor David Kepplinger wears a blue shirt and smiles" loading="lazy" typeof="foaf:Image" /></div> </div> <div data-block-plugin-id="field_block:node:profile:field_org_positions" class="block block-layout-builder block-field-blocknodeprofilefield-org-positions"> <div class="field field--name-field-org-positions field--type-text-long field--label-visually_hidden"> <div class="field__label visually-hidden">Titles and Organizations</div> <div class="field__item"><p>Assistant Professor, Department of Statistics</p> </div> </div> </div> <div data-block-plugin-id="field_block:node:profile:field_contact_information" class="block block-layout-builder block-field-blocknodeprofilefield-contact-information"> <h2>Contact Information</h2> <div class="field field--name-field-contact-information field--type-text-long field--label-hidden field__item"><p><strong>Building:</strong> Nguyen Engineering Building Room 1711 <strong>Mail Stop</strong>: 4A7 <strong>Phone: </strong>703 - 993 - 1671 <strong>Email: </strong><a href="mailto:dkepplin@gmu.edu" title="David Kepplinger email">David Kepplinger</a></p></div> </div> <div data-block-plugin-id="field_block:node:profile:field_personal_websites" class="block block-layout-builder block-field-blocknodeprofilefield-personal-websites"> <h2>Personal Websites</h2> <div class="field field--name-field-personal-websites field--type-link field--label-hidden field__items"> <div class="field field--name-field-personal-websites field--type-link field--label-hidden field__item"><a href="https://www.dkepplinger.org">Personal Website</a></div> </div> </div> <div data-block-plugin-id="inline_block:news_list" data-inline-block-uuid="45aeb4d1-3768-4d95-aaf1-9e4ff27d19cc" class="block block-layout-builder block-inline-blocknews-list"> <h2>In the News</h2> <div class="views-element-container"><div class="view view-news view-id-news view-display-id-block_1 js-view-dom-id-a416af32a2ac18418d6207ffe71d163ee7eca9303f741b808413ddc530e82a7d"> <div class="view-content"> <div class="news-list-wrapper"> <ul class="news-list"><li class="news-item"><div class="views-field views-field-title"><span class="field-content"><a href="/news/2023-03/mason-ponds-first-weather-station-canary-coal-mine" hreflang="en">Mason Pond’s first weather station is the canary in the coal mine</a></span></div><div class="views-field views-field-field-publish-date"><div class="field-content">March 31, 2023</div></div></li> <li class="news-item"><div class="views-field views-field-title"><span class="field-content"><a href="/news/2023-03/budding-scientist-monitors-masons-iconic-cherry-blossoms" hreflang="en">Budding scientist monitors Mason’s iconic cherry blossoms</a></span></div><div class="views-field views-field-field-publish-date"><div class="field-content">March 28, 2023</div></div></li> <li class="news-item"><div class="views-field views-field-title"><span class="field-content"><a href="/news/2023-03/early-spring-toys-second-annual-cherry-blossom-prediction-competition" hreflang="en">Early spring toys with second annual Cherry Blossom Prediction  Competition  </a></span></div><div class="views-field views-field-field-publish-date"><div class="field-content">March 9, 2023</div></div></li> <li class="news-item"><div class="views-field views-field-title"><span class="field-content"><a href="/news/2022-03/mason-cherry-blossom-predictions-play-statistics" hreflang="en">Mason cherry blossom predictions play up statistics </a></span></div><div class="views-field views-field-field-publish-date"><div class="field-content">March 30, 2022</div></div></li> <li class="news-item"><div class="views-field views-field-title"><span class="field-content"><a href="/news/2021-12/mason-department-statistics-welcomes-new-faculty" hreflang="en">Mason Department of Statistics welcomes new faculty </a></span></div><div class="views-field views-field-field-publish-date"><div class="field-content">December 14, 2021</div></div></li> </ul></div> </div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:profile:field_bio" class="block block-layout-builder block-field-blocknodeprofilefield-bio"> <h2>Biography</h2> <div class="field field--name-field-bio field--type-text-long field--label-hidden field__item"><p><span><span><span>David is an Assistant Professor of Statistics. He received his PhD in Statistics from the University of British Columbia and a Master of Science in Statistics from the Vienna University of Technology (Austria). His research primarily revolves around robust estimation in high-dimensional settings and applications in the life sciences. David is particularly interested in the robustness of feature selection in the presence of arbitrary contamination as well as countering the effects of contamination on predictive models.</span></span></span></p> <p><span><span><span>He teaches categorical data analysis (STAT665) for the department. He serves as a co–organizer for the statistics seminar series and PR & communications contact for the department.</span></span></span></p> <h3>Degrees</h3> <ul><li><strong>PhD, Statistics, </strong>University of British Columbia, 2020</li> <li><strong>MS, Statistics, </strong> Vienna University of Technology (Austria)</li> </ul><h3>Research Interests</h3> <ul><li>Robust statistics for high-dimensional data</li> <li>Regularized estimation and feature selection</li> <li>Computational statistics</li> <li>Non-convex optimization</li> </ul><h3>Publications</h3> <p><a href="https://scholar.google.ca/citations?user=Tw5_yA8AAAAJ&hl=en" title="Google Scholar">Google Scholar</a></p> <h3> </h3> <ul></ul></div> </div> </div> </div> Fri, 04 Sep 2020 14:04:00 +0000 Martha Bushong 48416 at