AI Research & Enhancement Competition
The Strange Data Project
16 days to unlock huge potential through critical thought and creative data acquisition.
The Vision
This isn't just about "strange data"—it's about unlocking the vast reserves of information that current language models simply cannot process. LLMs are powerful, but they're fundamentally limited to text. They can't see. They can't touch. They can't reason over spatial relationships, real-time sensor feeds, or proprietary datasets locked behind access barriers.
Your mission is to bridge this gap. Find data that models don't have access to, transform it into something they can understand, and demonstrate measurable improvement on a real task. No expensive retraining required—we're interested in creative approaches: vector databases for retrieval, fine-tuning adapters, hybrid pipelines, or entirely novel representations.
The focus is on proving how much better a model can perform when you give it access to the right data in the right way.
What Could This Look Like?
3D Spatial Understanding
Models can't natively reason about 3D space. Could you capture depth data, point clouds, or CAD representations and transform them into structured text or embeddings that enable spatial reasoning? Imagine an LLM that can actually understand "behind" or "inside."
Domain-Specific Knowledge Retrieval
Agriculture, medicine, manufacturing—industries with proprietary data that never made it into training sets. Build a vector database pipeline that pulls hyper-accurate, real-time information and watch a general model become a domain expert through retrieval-augmented generation.
Novel Data Modalities
Audio patterns, environmental sensors, satellite imagery, network traffic, biological signals—data that exists but isn't being used. Find a creative way to capture, process, and represent this information so models can reason over it for the first time.
These are starting points, not limitations. The best submissions will surprise us.
Total Prize Pool
€5,000
The Core Challenge
Build a system that makes a model dramatically better at a specific task—without expensive retraining. The key is finding and processing data that wasn't previously accessible to language models.
→ Acquire: Find data that LLMs can't see—whether it's locked behind access, exists in non-text formats, or simply hasn't been collected before
→ Transform: Convert this data into representations that models can consume—via embeddings, structured text, vector databases, or hybrid pipelines
→ Demonstrate: Show clear, measurable improvement on a real task compared to a baseline model without access to your data
→ Scale: Articulate how this approach could generate or unlock training-grade data at volume for long-term value
Scope of Work
Data Acquisition & Access
New methods for obtaining data models don't have access to. Physical sensors, devices, novel collection systems, environmental signals.
Reinterpreting Existing Data
Identifying poorly utilised datasets. Reformatting, restructuring, or re-contextualising to extract higher-quality signal.
Representation & Transformation
Converting non-text or edge-case data into model-usable representations. Structured text, embeddings, graphs, spatial abstractions.
Model Interaction (No Full Training)
Vector database retrieval, RAG pipelines, prompt engineering, tool use. Optional: parameter-efficient fine-tuning (LoRA, adapters). Focus is on enhanced performance through data access, not compute.
The Timeline
Kickoff: Saturday, Jan 31st
4-Hour Intensive Workshop
The Build: 16 Days
Independent project work (3 weekends included)
Demo Day: Sunday, Feb 15th
Final demos and judging
Final Deliverables
Data Narrative
Description of the source, novelty, and scaling potential
System Pipeline
How data is collected, processed, and consumed
Model Comparison
Baseline vs. enhanced system with side-by-side demo
Contextual Application
Real-world environment and value proposition
Forward Path
Evolution into a larger system, research/commercial merit
Prize Breakdown
Evaluation Criteria
Meaningful uniqueness or difficulty of access
Thoughtfulness and signal preservation
Clear performance gains over baseline
Ability to articulate why the system works
Plausible path to scale, productisation, or funding
Why Enter?
Team Formation
Don't have a team yet? No problem. On the morning of the kickoff, there'll be time to socialise as everyone checks in—meet other participants, share ideas, and form teams organically.
You can also connect with others throughout the event and over the two-week build period. We're happy to help match individuals looking for teammates.
Going solo is absolutely fine. This competition isn't about quantity—it's about quality. A focused individual with the right insight can outperform a large team.
Organized by
Nomad AI
Dublin's young AI community
Terms & Conditions
- •Payment: Prizes will be paid via bank transfer to the team lead within 14 days of the announcement.
- •Originality: Any team found using plagiarized non-open-source code will be disqualified immediately with no recourse.
- •Attendance: To claim the prize, at least one team member must be present in person at the Final Pitch on Feb 15th.
