big data analytics / en New scalable computing technique will make analyzing Big Data easier  /news/2024-09/new-scalable-computing-technique-will-make-analyzing-big-data-easier <span>New scalable computing technique will make analyzing Big Data easier </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Tue, 09/17/2024 - 16:23</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/lwang41" hreflang="en">Lily Wang</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">With the advancement of data collection techniques, there has been an exponential increase in the availability and complexity of datasets, particularly spatiotemporal data; finding the computing power to analyze such Big Data, however, has remained a challenge for many researchers in various fields. Through a collaborative research project funded by the National Science Foundation, AV statistics professor <a href="/profiles/lwang41">Lily Wang</a> hopes to change that.  </span></p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2024-09/lily_wang_500x500.png?itok=LdCm02CH" width="350" height="350" alt="Lily Wang, Professor, Statistics, College of Engineering and Computing. Photo by Creative Services" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Professor Lily Wang, Department of Statistics, College of Engineering and Computing. Photo by Creative Services</figcaption></figure><p>Wang and the Chair of the Department of Statistics at The George Washington University, <a href="https://statistics.columbian.gwu.edu/huixia-wang">Huixia Judy Wang</a>, are developing a form of scalable, distributed computing that could lessen the power demand on any single computer by distributing the analysis across a network of computers.  </p> <p>“In the past, we knew there were insights hidden in the data, but due to computing limitations, we couldn’t access them,” said Lily Wang. “Now, with scalable quantile learning techniques, we can gain a deeper understanding of the entire data distribution and extract insights into variability, outliers, and tail behavior, which are critical for more informed decision-making.” </p> <p>Spatial and temporal data are increasingly being used in such research areas as climate study and health care, among others, noted Lily Wang. </p> <p>“This data richness presents a lot of opportunities for getting deep insights into dynamic patterns over time and space; but it also brings many, many challenges,” said Wang. Large datasets often exhibit heterogeneous and dynamic patterns, requiring new approaches to capture meaningful relationships. </p> <p>This project uses two large datasets: the National Environmental Public Health Tracking Network database from the Centers for Disease Control and Prevention and the outdoor air quality data repository from the Environmental Protection Agency. </p> <p>“Both datasets have been challenging to analyze in the past due to their size and complexity,” explained Wang. “But through scalable and distributed learning techniques, we’re now able to handle large-scale heterogeneous data across the entire United States.” </p> <p>One of the project's major innovations is the use of distributed computing to divide the data into smaller, manageable regions. Each region is analyzed separately, and the results are efficiently aggregated to form a comprehensive understanding of the entire dataset.  </p> <p>“You can think of it like dividing the U.S. into small regions, analyzing each one separately, and then combining the results to create a comprehensive national analysis,” Wang said. “This method allows us to analyze millions of data points simultaneously without the need for supercomputers.” </p> <p>Beyond its goals for technical advancements, the project also emphasizes training the next generation of data scientists. Graduate students at George Mason and The George Washington will gain hands-on experience working with real-world data, helping to develop new computational methods.  </p> <p>The project began on September 1, 2024, and is expected to last three years. It has already garnered attention, including recognition from the office of Congressman Gerry Connolly (D-VA). </p> <p>The potential applications of this research are far-reaching, from improving air quality predictions to understanding public health trends and beyond. Wang explained, "This work empowers researchers and policymakers to leverage vast amounts of data to address rising societal issues more effectively.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/7351" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7631" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/8301" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/20306" hreflang="en">Research Interests: Nonstationary Time Series Analysis; Spectral Analysis; Nonparametric Statistics; Big Data; Bayesian Data Analysis; Applications in Medicine</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Tue, 17 Sep 2024 20:23:22 +0000 Teresa Donnellan 113926 at Professor applies statistics and AI to land use modeling and real estate pricing  /news/2024-05/professor-applies-statistics-and-ai-land-use-modeling-and-real-estate-pricing <span>Professor applies statistics and AI to land use modeling and real estate pricing </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Wed, 05/29/2024 - 12:18</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/asafikha" hreflang="en">Abolfazl Safikhani</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">AV statistics professor Abolfazl Safikhani recently applied his cutting-edge, interdisciplinary research to analyzing land use dynamics and property pricing shifts over time, work that underscores the transformative potential of data-driven insights, especially in urban planning and real estate. </span></p> <p>Safikhani earned bachelor’s and master’s degrees in mathematics before earning a doctorate in statistics. </p> <p>“I decided to do a PhD in statistics because throughout the master’s I had become more and more interested in connecting real world problems to data. And I'm very happy that I made that decision,” he said. </p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2024-05/resize_image_project-1.png?itok=YbD3pYgn" width="350" height="350" alt="Abolfazl Safikhani" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Abolfazl Safikhani</figcaption></figure><p>Along with a former colleague at the University of Florida in the urban planning department, Safikhani applied machine learning techniques to a dataset comprising millions of land parcels in Florida. The two endeavored to decipher the intricate dynamics of land use transformations over time and predict future developments with unprecedented accuracy. Their predictions surpassed 98% accuracy. </p> <p>But the team didn't stop with successful predictions. They recognized the importance of understanding the underlying mechanisms driving these predictions. With the addition of a new collaborator, Tianshu Feng in George Mason’s Systems Engineering and Operations Research Department, the researchers aim to present their land use analysis software as explainable artificial intelligence (XAI). By elucidating the black box of machine learning algorithms, Safikhani hopes local government decision-makers and urban planners can confidently leverage the software to optimize resource allocation effectively. </p> <p>Another of Safikhani’s projects considers land use and value specifically concerning the price of residential real estate. Safikhani’s own experience buying real estate in Fairfax County, Virginia, in 2022, inspired this project. When he asked his real estate agent to estimate a fair price of a certain house, the agent came back with an estimate based on the price of three comparable local properties that had recently sold. Ever a “quant guy,” Safikhani said, he thought there could be a better way: applying the idea of transfer learning. </p> <p>“The big idea of transfer learning is, within your big data set, try to find areas that have similar dynamics to your area of interest. And then use that similarity to improve your prediction,” Safikhani explained. “So, imagine that there is a little neighborhood somewhere in DC or somewhere in Maryland or somewhere in California that has dynamics very similar to the specific neighborhood where you want to buy a house in Northern Virginia. Once you account for some changes, let's say, regulations and things that are different, then the remaining dynamics are their similarities.” </p> <p>He continued, “If you only use your neighborhood, you can have three data points. If you use another, similar neighborhood, it's going to be 20. If you use neighborhoods from other places over the 50 states of the U.S., you may end up getting a thousand data points.” </p> <p>Safikhani is working with a colleague from the University of California – Los Angeles to bring in funding to develop this pricing software. Their preliminary results show the benefit of their proposed model versus current pricing systems.  </p> <p>Safikhani's research is poised to revolutionize sectors like urban planning and real estate. In fact, his research has attracted the attention of startups keen to translate his findings into real estate–disrupting tools. </p> <p>“It seems there's actually a growing interest in having such AI tools that would understand land use development and then really match it with pricing,” he said. “And sooner or later, this [technology] is going to come out. Platforms like Zillow are doing a good job, but there's much more that can be done.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/9211" hreflang="en">Applied Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7351" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/7631" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/8301" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/6906" hreflang="en">real estate entrepreneurship</a></div> <div class="field__item"><a href="/taxonomy/term/4656" hreflang="en">Artificial Intelligence</a></div> <div class="field__item"><a href="/taxonomy/term/4666" hreflang="en">AI</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Wed, 29 May 2024 16:18:12 +0000 Teresa Donnellan 112346 at Big data may lead to safer roadways, lower emissions /news/2023-10/big-data-may-lead-safer-roadways-lower-emissions <span>Big data may lead to safer roadways, lower emissions </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Mon, 10/23/2023 - 12:26</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/szhu3" hreflang="und">Shanjiang Zhu</a></div> <div class="field__item"><a href="/profiles/avidyash" hreflang="und">Anand Vidyashankar</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p>Once, transportation officials made decisions based on household surveys performed roughly once per decade, which asked selected households to record their travel behavior on a given day. With the advent of smartphones, similar data became available roughly every few minutes. Now, with an increasing number of connected vehicles on the road, that data is available in nearly real time. CEIE professor Shanjiang Zhu is embracing this shift, exploring the capabilities of researchers with this massive amount of data. </p> <div class="align-left"> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2023-10/140725201.jpg?itok=xDqnzr2p" width="350" height="350" alt="Shanjiang Zhu" loading="lazy" typeof="foaf:Image" /></div> </div> <p>With funding from the Virginia Department of Transportation (VDOT), Zhu and his research team, Anand N. Vidyashankar from the department of statistics and Chenfeng Xiong from the Civil and Environmental Engineering department at Villanova University, will reconcile the travel data from three different sources—surveys, smartphones, and connected vehicles—into invaluable travel information. </p> <p>"In the past, we tried to understand travel behavior, which is critical for future investment decisions and also transportation policy, based on survey data,” Zhu explained. “Based on that, you understand, on average, where people have traveled, in what mode, with whom, and spent how much time there, uh what is the purpose for the trip, etc. Using that information, you can develop a model that basically can predict future scenario, like how congested the network could be in 2040; and that drives all the investment decisions and policy debates.” This method introduces problems of timeliness, as it can skip major events such as the COVID-19 pandemic, and human error, as people would not necessarily remember every detail of their travel on a given day. </p> <p>The introduction of widespread smartphone use about ten years ago made the available data much denser, said Zhu, resulting in about one data point every three to five minutes. Each time a person’s smartphone app calls for location service, their location is automatically registered. Nevertheless, this method introduced a bias problem, as not everyone owns a smartphone and not everyone uses them often.  </p> <div class="align-right"> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq291/files/styles/small_content_image/public/2023-10/MicrosoftTeams-image%20%2824%29.png?itok=NTv_5yAW" width="350" height="350" alt="Drone image of highway traffic at night" loading="lazy" typeof="foaf:Image" /></div> </div> <p>About a year ago, Zhu’s team won a competition hosted by VDOT to make the best possible use of connected vehicle data, basically newer vehicles like those with an “SOS” button installed. One drawback to the data currently is connected vehicles currently make up a relatively small share of vehicles on the road. </p> <p>“But we have ways to make corrections from a statistical perspective, and then this gives you a much more accurate picture of traffic on the road,” said Zhu, adding “On average, it's one data point every three seconds. With such data, the accuracy and timeliness of travel demand models could be greatly improved.” Zhu noted his colleague Vidyashankar will be reviewing the data fusion to ensure a rigorous statistical approach. </p> <p>The new data also opens the door for new safety studies, Zhu said, adding safety studies are currently based mainly on police reports after an accident has occurred. By using alternate data, such as how often a car’s brake deceleration rate exceeds a certain threshold or how hard a driver turns the steering wheel, dangerous locations might be addressed before an accident occurs. Zhu is interested in exploring the topic further using the dataset resulting from his current project.   </p> <p>Zhu foresees data from connected vehicles becoming increasingly important as more and more people adopt the technology. He said, “Now we are investing in the methodology part and seeing how we can make this connection more productive, to improve the driving environment, to make our roads safer, to make the driving experience better, and also to reduce our energy consumption and emissions." </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/3001" hreflang="en">Department of Civil Environmental and Infrastructure Engineering (CEIE)</a></div> <div class="field__item"><a href="/taxonomy/term/10161" hreflang="en">transportation engineering</a></div> <div class="field__item"><a href="/taxonomy/term/5851" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/18686" hreflang="en">Transportation Policy Operations and Logistics</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Research</a></div> <div class="field__item"><a href="/taxonomy/term/19146" hreflang="en">CEC faculty research</a></div> </div> </div> </div> </div> </div> Mon, 23 Oct 2023 16:26:19 +0000 Teresa Donnellan 109321 at The future at Fuse: Data analytics  /news/2023-02/future-fuse-data-analytics <span>The future at Fuse: Data analytics </span> <span><span lang="" about="/user/1441" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Teresa Donnellan</span></span> <span>Mon, 02/13/2023 - 14:50</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class="field__items"> <div class="field__item"><a href="/profiles/jbaldo" hreflang="und">James Baldo</a></div> <div class="field__item"><a href="/profiles/bschmid5" hreflang="und">Bernard Schmidt</a></div> <div class="field__item"><a href="/profiles/kcomer" hreflang="und">Kenneth Comer</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p>Mason Square is home to data analytics classes from several College of Engineering and Computing (CEC) departments. This makes it promising locale for AV’s multidisciplinary Data Analytics Engineering Program. The Data Analytics Engineering master of science program is the second largest in CEC, with more than 700 students enrolled in classes across three Mason campuses. Currently, three sections of the capstone course DAEN 690 meet at Mason Square. </p> <p>“Mason Square is a perfect place to hold courses, especially for individuals who are returning students or lifelong learning adults, who are government employees or contractors,” says interim program director Bernard Schmidt.  </p> <p>Starting with the Fall 2023 semester, Schmidt said the goal is to schedule as many sections of DAEN courses, including the capstone course, at Mason Square as is offered at the Fairfax Campus.  </p> <p>“This offers the greatest opportunity to not only maximize the utilization of Mason Square, but it also offers more course availability options as a way to encourage those returning students and lifelong learning adults located in Northern Virginia to apply to the DAEN program and enroll in DAEN program courses,” he says. </p> <p>Students can earn an <a href="https://catalog.gmu.edu/colleges-schools/engineering-computing/data-analytics-engineering-ms/#requirementstext" target="_blank">MS in Data Analytics Engineering</a> or a <a href="https://catalog.gmu.edu/colleges-schools/engineering-computing/data-analytics-graduate-certificate/#requirementstext" target="_blank">certificate in Data Analytics.</a> Both options require core classes in applied information technology, computer science, operations research, and statistics. The master’s further includes a capstone data analytics project and a concentration in a specialized technical area. Currently, students can choose from 13 different concentrations ranging from health data analytics to financial engineering. </p> <p>Ken Comer, who teaches the core course OR 531 Analytics and Decision Analysis, says these students’ aim is to develop the types of tools that do the data analytic work for a client and consistently provide meaningful and useful results.  </p> <p>“The engineer designs. It's someone else who operates, so in order to get there, of course, the students have to learn the techniques that they're going to have [in order] to create as the engineer,” he says, citing the three techniques that can be embedded in a data analytic tool: optimization, simulation, and data analysis.  </p> <p>The study of data analytics is more than simply using data visualization tools, Comer says. He notes one problem with data science can be that the scientist gives the client copious insights into the client’s data, but the client may not actually feel helped in deciding.  </p> <p>“[Students] get a lot of tools in how to curate data, how to manipulate it, store it, move it around, secure it, and a variety of other things,” says Comer, “But this is the only class where they get to … learn how to create a useful product out of the data—something that would help somebody, that is important for somebody who has to make a decision.” </p> <p>In his syllabus, Comer explains, “Every problem we will work will be focused on the allocation of resources or some other important decision that might be experienced in the course of business, industry, or government operations. Some examples are: If I’m offered additional resources, how much should I pay for them? What is the mix of production decisions that maximizes profit? What sequence should I use for a multi-step process? How much should I save to ensure an 80% chance of having a set sum at retirement?” </p> <p>He further writes, “In order to receive proper credit, you will need to answer the question at hand. Your boss or your customer cannot be expected to search around your spreadsheet for the numerical answer.” </p> <p>James Baldo, the program director currently on sabbatical, says he is excited that working students at Mason Square might be able to apply their newly acquired skills immediately in the workplace. “It's going to be a hub of industry partnerships down there with the university,” he predicts. </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class="field__items"> <div class="field__item"><a href="/taxonomy/term/3071" hreflang="en">College of Engineering and Computing</a></div> <div class="field__item"><a href="/taxonomy/term/15406" hreflang="en">Mason Square</a></div> <div class="field__item"><a href="/taxonomy/term/16766" hreflang="en">Fuse at Mason Square</a></div> <div class="field__item"><a href="/taxonomy/term/11566" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/4766" hreflang="en">data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/7296" hreflang="en">Data Analytics Engineering</a></div> <div class="field__item"><a href="/taxonomy/term/116" hreflang="en">Campus News</a></div> </div> </div> </div> </div> </div> Mon, 13 Feb 2023 19:50:50 +0000 Teresa Donnellan 104256 at