viernes, 26 de julio de 2013

OpenData - Colombia

Open Data?

Hace algunas semanas escribi un comentario bastante critico a  sobre lo que el estado colombiano denomina "Open Data".
En los ultimos meses el Estado colombiano menciona frecuentemente sus politicas "Open Data".
Interesado en usar los paquetes de datos con algoritmos de Machine learning me emocione un poco.
El sitio web tiene paquetes de datos risibles, risibles porque la cantidad de datos son en los mejores casos 40 registros, o son datos absolutamente inutiles.

El OpenData debe ser una propuesta seria, para que los ciudadanos puedan ejercer control sobre el estado.

Decepcionado con la propuesta mediocre del gobierno he decidido iniciar dos miniproyectos personales.

Recursos para Procesamiento de lenguaje Natural

La idea es coleccionar recursos que ayuden al procesamiento de lenguaje natural. Corpora y diccionarios.

El repositorio LatinamericanTextResources contiene:
  • los discursos del expresidente Alvaro uribe desde el 2007 hasta el 2010
  • los discursos del president Juan manuel santos desde el 2010 hasta el 2013

Recursos sobre Violaciones de Derechos Humanos en Colombia

Usando bases de datos de grafos estoy correlacionando diferentes bases de datos sobre violaciones de derechos humanos. Espero tenerlo pronto en github.
Al momento un buen dataset sobre derechos humanos en colombia esta disponible en

martes, 12 de febrero de 2013

Winter morning walks

As a result of  a daily Walk from Home to the University. I ended up taking  a lot of shots of the forest.

jueves, 7 de febrero de 2013

Diccionario de Entidades Colombianas - Extraccion de Informacion

He trabajado las ultimas semanas en un sistema basado en bootstrapping para capturar automaticamente Entidades (Personas, Lugares, Organizaciones del ambito colombiano) de Texto puro.

La idea de estos diccionarios es que sean utiles en diferentes tareas en las areas de Extraccion de informacion y mineria de datos.
Agrege ciertas entidades de manera manual  y espero seguir agregando nuevas entradas para posteriormente crear una taxonomia de diccionarios.

Para completar la lista agrege un diccionario de jerga latinoamericana (Inclueyendo expresiones multipalabras (Multiword Expressions)) generada automaticamente. Tambien una lista creada manualmente de adjetivos positivos, negativos y sensoriales que pienso extender proximamente.

El repositorio Diccionario de Entidades Colombianas en Github contiene las listas.
Aqui dejo una descripcion

Diccionario de Entidades Colombianas

Esta es una lista de Gazeteers( Diccionarios de Entidades) generados automatica y manualmente. Contiene diccionarios de:


  • Politicos Colombianos
  • Senadores Colombianos por Partido
  • Presidentes Colombianos
  • Periodistas Colombianos
  • Criminales Colombianos
  • Atletas Colombianos
  • Actores Colombianos


  • Ciudades Colombianas


  • Empresas Colombianas
  • Equipos Colombianos de Futbol
  • Abreviaciones de Organizaciones Colombianas
  • Bandas de Musica Colombianas


  • Lista de Adjetivos Negativos
  • Lista de Adjetivos Positivos
  • Lista de Adjetivos Sensoriales
  • Lista de Jerga Latinoamericana con expresiones multipalabra

jueves, 24 de enero de 2013

Semisupervised Learning - Label Propagation

I attach my slides for my seminar presentation on "Semisupervised Learning - Label Propagation ".
The slides are based on the paper: "Learning from Labeled and Unlabeled Data with Label Propagation " by  the very famous Xiaojin Zhu

Implementations of Label Propagation:

Java Label Propagation by Kohei Ozaki
Junto by Partha Pratim Talukdar
Label Propagation (C++ Implementation) with visualization by Kohei Ozaki


Luxembourg City - 2012

Before 2012 ended I got the chance of going to Luxembourg City.
It is very near to Saarbruecken. You can actually buy a round trip bus ticket for 18 euro.

Luxembourg fromn the Grund

Given the housing costs in Luxembourg many German and French  people work there and live abroad
but there are also many foreigners working there and living in France or Germany.
The nearest German city is Trier, and Metz in france is very very near.

Luxembourg from the Grund

I travelled there using a lorraine, Rhein Pfalz Ticket
so in one single day we(2 friends and I) visited Metz, Luxembourg City
and Trier.

Luxembourg from the Grund
Luxembourg is a nice city, I love its arquitechture and the fact you can
get around either speaking German or French.It is a small city, and to be honest there is nothing much to see, but it is worth seeing, personally I love it.

lunes, 14 de enero de 2013

Arriving to Saarbruecken - a guide for students

If you are reading this post, you are coming to Saarbruecken, and as I was in 2011 this is your first time living abroad or at least in Germany, you are probably travelling soon and you want to get some information.
I arrived to Saarbruecken in 2011 in order to study my master degree, and I write this entry as it might be helpful for anybody arriving here.


There are for me two options, either one studio or a room in a flatshare.
I've always lived in flatshares, I think it is nice to share with people and at same time the prices are lower than a studio.
If you plan to look for either of them from abroad it will be rather difficult.
People will requite to see you face to face, and doubt of those who do not want to.
The best option is to contact people who are studying the same degree as you will be studying and are seniors. Students here are really willing to help,  they will help you by letting you stay somedays at their places while you find a good offer.
The other option is to pay a hostel, the drawback is: there are not that many hostels in SB.
When renting watch out the contract, most of the times is in German and some very few people will take advantage of this is you are not careful enough.

Some Contracts will require you to : paint your place before cancelling, some of them will have a notice period of at least 3 months(the usual) but it can be longer or none at all.
Usually the contract also establishes the concept of a deposit, this is usually given back to you within  6 months after you return the room to the landlord.

Locations to live?

In saarbruecken you can find all kind of prices everywhere. There are many areas with their own advantages and disadvantages.
I would divide them into:
Saarbruecken city, Dudweiler and Scheidt.

Saarbruecken city is the 'city', where all the pubs are, where the parties take place. If you plan on having an active life in the pubs ,parties and other social activities you should pick this area.
Why? The public transportation in Saarbruecken is good, I love it, but in the night many services to many areas are shut down, so if you live in another area, forget about staying in saarbruecken longer than 1am.

Scheidt and Dudweiler. This is where I live (Dudweiler), I love it. why? it is small, beautiful and I can take a bus which is 5mins away from university. there is nothing more to say, this used to be a small town that was absorved by Saarbruecken. There is no much activity here ,but many students live in this area.
Scheidt is a little bit further, but it is more or less the same as Dudweiler.

When comparing prices, you can find similar prices in all the areas.

websites for looking accommodation:

I arrived, what to do next?

Once you arrived focus on finding a place and once you have signed a contract with your landlord.
you have to register in the Burgeramt. The Burgeramt is an office which will register where you are living. You just have to take your passport and contract with you, they will give you a piece of paper called "Anmeldungbestaetigung", this is an important document for opening a bank account and doing other bureaucracy procedures.

martes, 11 de diciembre de 2012

Automatic Generation of Domain Models for Call Centers from Noisy Transcriptions

I attach my slides for my seminar presentation on "Ontologies and Knowledge Representation".
The slides are based on the paper:
"Automatic Generation of Domain Models for Call Centers from Noisy Transcriptions " by  Shourya Roy and Venkata Subramaniam

The paper considers an automatic way to build a taxonomy from raw text. It also considers potential applications of the automatic built taxonomy