Haorui Zhang

Logo

LinkedIn | Google Scholar | GitHub

I am pursing a Master of Science degree in Data Science at New York University.

I graduated from Wake Forest University in 2022 with a double major in Finance and Statistics

Past Data Science project included topics in biology, history, archeology, finance, mathematics, etc

Portfolio


Computational Data Science

Uncovering the dynamic effects of DEX treatment on lung cancer by integrating bioinformatic inference and multiscale modeling of scRNA-seq and proteomics data

Our team developed a novel Bioinformatic Inference and Multiscale Modeling (BIMM) method, employing a multiscale model of tumor regulation in providing insights of computational studeies in tumorigenesis and oncotherapy. Our article had been recently accepted by Computers in Biology and Medicine

View on GitHub

Mathematical Modeling: Raw A549 cell data was translated into a system of ordinary differential equations (ODEs) based on biochemical rationales to describe reactions of synthesis, degradation, phosphorylation, and dephosphorylation.

Survival Analysis: Kaplan-Meier (KM) analysis was performed using the “survival” R package and Log-rank test was used to test the differences in survival curves.

Parameter optimization: We optimized the model parameters of the ODE system by minimizing the residual error between empirical data and simulated results. Different optimization algorithms are used and compared to find the optimal parameter set.



System vulnerability and criticality of human brain under evolving neuropathological events in Alzheimer’s Disease

In this work, we characterized the interaction of AT[N] biomarkers and their propagation across brain networks using a novel bistable reaction-diffusion model. We applied our model to large-scale longitudinal neuroimages from the ADNI database and studied the systematic vulnerability and criticality of brains. Our paper was recently under review by Neuroimage.

View on GitHub

Major Findings: Our major findings include (i) tau is a stronger indicator of regional risk compared to amyloid, (ii) temporal lobe exhibits higher vulnerability to AD-related pathologies, (iii) proposed critical brain regions outperform hub nodes in transmitting disease factors across the brain, and (iv) disruption of metabolic balance is the most determinant factor contributing to the initiation and progression of Alzheimer’s disease.


Reaction-diffusion model: Our proposed network-guided biochemical model consists of a classic bistable model and network diffusion. This relatively simple model enabled us to investigate the spatiotemporal dynamics of ATN biomarkers in AD by capturing the essence of the underlying mechanism of complex biological phenomena. The pathological network is an integration of ATN reactions and network diffusion. At each brain region, the production (①, ③) and clearance (②, ④) of amyloid and tau proteins are included in the model following zero-order and first-order mass-action kinetics, respectively.



Statistical Appoarches to Subtype and Subtage Detection in Alzheimers’ Disease

We employed statistical approaches, including kmeansclustering and z-score classification, to detect individual subtypes and substages. We transcribe the spread pattern of tau and β- amyloid proteins into numerical value via z-score, where we find optimal cut-offs inside five structural cohorts and convert it into an adjacency matrix which later elicits individual diseases subtypes trajectories among existing stages. We finally compared our result to clinical score and our model offers statistical validity in revealing disease substages.




Financial Analytics

During my undergraduate degree at Wake Forest, my double major in finance offered me extensive experiences in developing business sense and making data-related decisions

Optimization of GMV model using Log-Logistic model

In this project, I cleaned and decoupled the seasonl GMV data to be ingested in the forecasting model, fitting a log-logistic model to predict all future value. I was accurately aware that, the model’s parameters could be improved as the GMV for each day varies. As a result of back testing, the prediction model’s accuracy had greatly increased by 21% for the forecast. I subsequently presented my findings in front of the team and the new features were accepted into the official model.

View on GitHub




Data competition in business simulation

In this project, our team managed to conduct decision in building a manufacturing genetic testing devices company, Global DNA. We would like to keep a sustainable competitive advantage and introduce new products in the market in five to eight years to continue to increase our market share. Initially, our first product will be sold in America with positioning aimed toward the performance segment. By positioning directly in the performance segment, our company will maintain strong customer satisfaction and market share. This product will match customer demand and importance for speed, accuracy, price, and age. We will steadily increase capacity and automation to meet our changing demand. As our company grows, we plan to expand into new markets and demographics. Specific product specs will be determined by data from other companies and products in those regions, as well as customer demand. We will be differentiated from competitors based on a well-rounded approach that will encapsulate both performance and budget segments in global markets at a steady rate, after gaining a steady track record in an initial performance market in the Americas.




Kaggle competition Optiver Realized Volatility Prediction

Volatility is one of the most prominent terms you’ll hear on any trading floor – and for good reason. In financial markets, volatility captures the amount of fluctuation in prices. High volatility is associated to periods of market turbulence and to large price swings, while low volatility describes more calm and quiet markets. In this project, we build models that predict short-term volatility for hundreds of stocks across different sectors. We design our model forecasting volatility over 10-minute periods where we employed EDA, Feature Engineering and Machine Learning Models to evaluated against real market data collected in the three-month evaluation period after training.

View on GitHub





History and Archeology

Besides Data Science, I also have a great passion for history and archeology. I enjoyed using data science methodologies into field of history to solve unsettled problem.

Interaction and Distances in the Amarna Letters

In this project, I employed Python NLP packages to identify a positive relationship between different political entities and characters. I implemented NLTK in cleaning, tokening, and merging over 300 letters, 100 names, and 40 interactions where I subsequently test the hypothesis that greetings can be classified into different level of obsequiousness based on geographical location.

View on GitHub





Reconstruction of the histroical migration of Yuezhi people

My paper concerns a nomadic tribe in Central Asia, the Yuezhi, in the mid-second century BC. At this time the Yuezhi were unable to defeat two other nomadic confederations. As a result, the Yuezhi were forced to migrate westward from their original homeland in the Chinese province of Gansu to the Ili valley in modern Kazakistan. Eventually, they migrated all the way to ancient Bactria, the northern part of modern Afghanistan. My purpose is to resolve the date of the Yuezhi’s migration from the Ili valley to Bactria. According to Craig Benjamin, the Yuezhi were expelled from their settlement in the Ili valley by another nomadic confederation, the Wusun, by 131 BC. However, a closer reading of the historical records reveals that there is no evidence to support Benjamin’s dating scheme. My paper was accepted by the silk road conference student panel




Lion in Hittite culture –Mursili’s conquest of Babylonia and the return of Marduk

Lion is a widespread symbol in the ancient Near East. The royal association of the lion is well attested in numerous metaphors applied to kings in both Sumerian and Akkadian texts as well as their artistic evidence. Away from Mesopotamia, the Hittites also show their traces from the Babylonians as well. In their tradition, lion was not only one of the most important animal metaphors, but was also an approach employed by the kings. This essay aimed to find a way of acting in the early Hittite kings—the Line of the Lions, Hattusili I, Mursili I and Usurpers, Hantili I, Zidanta I, Ammuna I. The conquest of Babylonia appears to be taking much more historical significance in Hittite history. Not only it marked an end of the early Hittite supremacy, but also diminished the idea of lions speaking on the behalf of Telipinu.




© 2022 Haorui Zhang. Powered by Jekyll and the Minimal Theme.