In 2025 I completed the AI and Big Data Specialization at Campus Net Manyanet Les Corts (Barcelona). I came from the web development world — Laravel, PHP, MySQL — and wanted to understand what was really behind the AI boom. What I found was a deep, demanding, and fascinating ecosystem.
🐍 Python as a data language
Even though I had experience with other languages, Python for data science is a world apart. The first module focused on mastering the core tools of the ecosystem:
What struck me most about Pandas was the concept of vectorization: operating on entire DataFrame columns instead of iterating row by row. The performance difference on large datasets is enormous.
🤖 Machine Learning with scikit-learn
The core of the specialization was Machine Learning. I learned to distinguish when to apply each approach and how to properly evaluate models.
Supervised learning
- Classification: Logistic Regression, decision trees, Random Forest and SVM to predict categories.
- Regression: Linear and polynomial regression to predict continuous values.
- Evaluation: accuracy, precision, recall, F1-score and confusion matrices.
Unsupervised learning
- Clustering: K-Means and DBSCAN to group unlabeled data.
- Dimensionality reduction: PCA to visualize high-dimensional datasets in 2D/3D.
💡 The concept that changed my thinking the most
Overfitting. Building a model that memorizes training data instead of learning general patterns is the most common mistake. Cross-validation and the train/test split became mandatory before declaring any result valid.
📊 Big Data: data at industrial scale
The Big Data module expanded the perspective toward processing volumes of information that don't fit in memory:
- The 3 Vs: Volume (terabytes of data), Velocity (real-time streams) and Variety (structured and unstructured).
- Distributed processing: MapReduce principles and how frameworks like Spark distribute operations across nodes.
- ETL Pipelines: designing ingestion, transformation and load flows for large-scale analytics systems.
- Storage: when to use relational databases, NoSQL or data lakes depending on the use case.
🧠 Final project: mental health detection on social media
The highlight was an end-to-end applied project. The goal: detect early signs of mental health issues by analyzing social media posts.
Project phases
- Data: dataset of labeled posts (depression, anxiety, neutral) from public academic sources.
- NLP preprocessing: text cleaning, tokenization, stopword removal and stemming with NLTK. Vectorization with TF-IDF.
- Modeling: comparison of Naive Bayes, SVM and Random Forest. SVM with RBF kernel achieved the best F1-score on the minority class.
- Evaluation: focus on recall for the positive class to minimize false negatives, prioritizing safety over overall accuracy.
📌 Key takeaway: In imbalanced class problems, accuracy is misleading. A model that always predicts «healthy» can reach 90% accuracy if only 10% of cases are positive. The key lies in the recall and F1-score of the minority class.
💡 What I take away from this specialization
Beyond the specific tools, what I value most is the mindset shift:
- Data first: before building any model, invest time in understanding, cleaning and exploring the data.
- Define the problem well: which metric we optimize and why, based on the real context.
- Reproducibility: document every step of the analysis so it can be audited or repeated.
- Connection to my backend profile: I can integrate trained models into REST APIs with Laravel/PHP, closing the complete loop.
Frequently asked questions about AI and Big Data
What do you learn in an AI and Big Data specialisation?
You learn Python for data analysis, Machine Learning with scikit-learn and TensorFlow, large-scale data processing and applied statistics. The goal is to solve real business problems with data, not just handle algorithms in a theoretical way.
What is studying AI and Big Data useful for if you are already a developer?
It adds real value to a technical profile: you can integrate predictive models into applications, automate data-driven decisions and access more complex, better-paid projects. For a full stack developer, it opens doors at top-tier technology companies.
Is Python hard to learn for Big Data if you already know how to code?
With a background in object-oriented programming, Python is relatively straightforward. The real learning curve lies in data libraries (Pandas, NumPy, scikit-learn) and statistical concepts. With prior experience in Java or PHP, the basics typically take two to four weeks to master.
What is the difference between Artificial Intelligence and Big Data?
Big Data refers to processing large volumes of data using tools like Hadoop or Spark. AI uses that data to train models that learn patterns and make decisions. They are complementary disciplines: Big Data feeds AI models with the data they need to function.
Is it worth studying AI and Big Data in 2025?
Yes, especially with a prior technical foundation. Demand for profiles combining software development with AI knowledge is growing steadily. Knowing how to integrate models into real applications already sets a junior profile apart from a senior one at many technology companies.
🚀 Want to know what your business actually needs?
I'll give you a free, no-commitment consultation. No pressure, no selling you things you don't need. Write to me here →