Nomad AI Logo
IndustriesAboutEvents
Back to Events
€5,000 Prize Pool

AI Research & Enhancement Competition

The Strange Data Project

16 days to unlock huge potential through critical thought and creative data acquisition.

The Vision

This isn't just about "strange data"—it's about unlocking the vast reserves of information that current language models simply cannot process. LLMs are powerful, but they're fundamentally limited to text. They can't see. They can't touch. They can't reason over spatial relationships, real-time sensor feeds, or proprietary datasets locked behind access barriers.

Your mission is to bridge this gap. Find data that models don't have access to, transform it into something they can understand, and demonstrate measurable improvement on a real task. No expensive retraining required—we're interested in creative approaches: vector databases for retrieval, fine-tuning adapters, hybrid pipelines, or entirely novel representations.

The focus is on proving how much better a model can perform when you give it access to the right data in the right way.

What Could This Look Like?

3D Spatial Understanding

Models can't natively reason about 3D space. Could you capture depth data, point clouds, or CAD representations and transform them into structured text or embeddings that enable spatial reasoning? Imagine an LLM that can actually understand "behind" or "inside."

Domain-Specific Knowledge Retrieval

Agriculture, medicine, manufacturing—industries with proprietary data that never made it into training sets. Build a vector database pipeline that pulls hyper-accurate, real-time information and watch a general model become a domain expert through retrieval-augmented generation.

Novel Data Modalities

Audio patterns, environmental sensors, satellite imagery, network traffic, biological signals—data that exists but isn't being used. Find a creative way to capture, process, and represent this information so models can reason over it for the first time.

These are starting points, not limitations. The best submissions will surprise us.

Total Prize Pool

€5,000

The Core Challenge

Build a system that makes a model dramatically better at a specific task—without expensive retraining. The key is finding and processing data that wasn't previously accessible to language models.

Acquire: Find data that LLMs can't see—whether it's locked behind access, exists in non-text formats, or simply hasn't been collected before

Transform: Convert this data into representations that models can consume—via embeddings, structured text, vector databases, or hybrid pipelines

Demonstrate: Show clear, measurable improvement on a real task compared to a baseline model without access to your data

Scale: Articulate how this approach could generate or unlock training-grade data at volume for long-term value

Scope of Work

Data Acquisition & Access

New methods for obtaining data models don't have access to. Physical sensors, devices, novel collection systems, environmental signals.

Reinterpreting Existing Data

Identifying poorly utilised datasets. Reformatting, restructuring, or re-contextualising to extract higher-quality signal.

Representation & Transformation

Converting non-text or edge-case data into model-usable representations. Structured text, embeddings, graphs, spatial abstractions.

Model Interaction (No Full Training)

Vector database retrieval, RAG pipelines, prompt engineering, tool use. Optional: parameter-efficient fine-tuning (LoRA, adapters). Focus is on enhanced performance through data access, not compute.

The Timeline

Kickoff: Saturday, Jan 31st

4-Hour Intensive Workshop

The Build: 16 Days

Independent project work (3 weekends included)

Demo Day: Sunday, Feb 15th

Final demos and judging

Final Deliverables

Data Narrative

Description of the source, novelty, and scaling potential

System Pipeline

How data is collected, processed, and consumed

Model Comparison

Baseline vs. enhanced system with side-by-side demo

Contextual Application

Real-world environment and value proposition

Forward Path

Evolution into a larger system, research/commercial merit

Prize Breakdown

1st Place
€3,000
2nd Place
€1,500
3rd Place
€500

Evaluation Criteria

Data Novelty20%

Meaningful uniqueness or difficulty of access

Representation Quality20%

Thoughtfulness and signal preservation

Demonstrated Improvement25%

Clear performance gains over baseline

Clarity of Explanation15%

Ability to articulate why the system works

Future Potential20%

Plausible path to scale, productisation, or funding

Why Enter?

€5,000 cash prizes
Direct pipeline to Nomad projects
Research role opportunities
Network with Dublin AI leaders

Team Formation

Don't have a team yet? No problem. On the morning of the kickoff, there'll be time to socialise as everyone checks in—meet other participants, share ideas, and form teams organically.

You can also connect with others throughout the event and over the two-week build period. We're happy to help match individuals looking for teammates.

Going solo is absolutely fine. This competition isn't about quantity—it's about quality. A focused individual with the right insight can outperform a large team.

Competition Dates

Jan 31 – Feb 15, 2025

16-day sprint

Kickoff Event

Saturday, Jan 31

10:00 AM – 2:00 PM

Location

Vibeworks

D14 W6X6, Dublin

Prize Pool

€5,000 Total

Teams of 1-5 people welcome

Don't wait for the kickoff

Join us Jan 18th at Sasha's Coffee Bar to get the first look at competition themes and start recruiting your team.

Organized by

Nomad AI

Dublin's young AI community

Terms & Conditions

  • Payment: Prizes will be paid via bank transfer to the team lead within 14 days of the announcement.
  • Originality: Any team found using plagiarized non-open-source code will be disqualified immediately with no recourse.
  • Attendance: To claim the prize, at least one team member must be present in person at the Final Pitch on Feb 15th.

Data Is Everywhere. Models Are Blind To Most Of It.

Whether you're a student, researcher, or developer—if you can find data others have overlooked and make models perform better with it, this competition is for you. Creative acquisition. Smart transformation. Measurable results.