A few months ago a paper was published reporting evidence of structures in the universe so large that they break popular theories about the existence of dark matter and dark energy. Such vast structures should be impossible to form under natural processes.
I do not personally believe that so-called dark matter and energy are the only explanation for the unexplained phenomena that scientists have been observing. So when the existence of the Big Ring and Giant Arc was reported, I began to think deeply about what this might imply about the sort of forces that are shaping our universe. What if such structures are actually common in the universe, but we just haven’t been looking for them? Or, maybe more advanced tools could reveal hidden truths?
As an aspiring AI engineer, I am especially interested in scientific computation. The application of ML to cosmologic datasets seems like an ideal laboratory to practice my tradecraft. If there is one thing that ML is good for, it is the identification of patterns. And if there is one field that needs pattern analysis on a large scale, its cosmology.
So I decided to kick off a research project with the aim of finding evidence of ultra-large-scale structures by applying machine learning algorithms to freely-available cosmologic datasets. My plan for the time being is to collect, clean, and process data into a form suitable for machine-learning while I research astrophysics and ML algorithms that might be effective for this project. Since we don’t really know what sort of impossible structures might be out there, unsupervised learning algorithms seem like a good place to start, but I may revisit that as time goes on.
This is undoubtedly the wrong time to start such a project. In a week I will have approached the half-way point of my summer semester of college, which involves compressing an entire semester into six weeks or so. Picking up astrophysics self-study in parallel with two college-level math courses and computer science is probably a terrible idea. But I’ll be the first one to point out that many of my ideas are.
The first step in this is to acquire data, and I just finished doing so in an all-night research marathon. I will write a post about that in the near future, but first I need to make sure everything is well-documented. It would probably be good practice to do things the right way, but I could be wrong about that. I’ve been wrong before.

