A lot of stuff has happened in the last couple of weeks. The project is coming along nicely and I am now getting into some of the bulky parts of it.
There is an issue with the way NaN (not a number) checks are handled that spans beyond SciPy. Basically, there is no consensus on how to deal with NaN values when they show up. In statistics they are often assumed to be missing values (e.g. there was a problem when gathering statistic data and the value was lost), but there is also the IEEE NaN which is defined as 'undefined' and can be used to indicate out-of-domain values that may point to a bug in one's code or a similar problem.
Long story short, the outcome of this will largely depend on the way projects like pandas and Numpy decide to deal with it in the future, but right now for SciPy we decided that we should not get in the business of assuming that NaN values signify 'missing' because that is not always the case and it may end up silently hiding bugs, leading to incorrect results without the user's knowledge. Therefore, I am now implementing a backwards compatible API addition that will allow the user to define whether to ignore NaN values (asume they are missing), treat them as undefined, or raise an exception. This is a longterm effort that may span through the entire stats module and beyond so the work I am doing now is set to spearhead future development.
Another big issue is the consistency of the `scipy.stats` module with its masked arrays counterpart `scipy.mstats`. The implementation will probably not be complicated but it encompasses somewhere around 60 to 80 functions so I assume it to be a large and time consuming effort. I expect to work on this for the next month or so.
During the course of the last month or two there have been some major developments in my life that are indirectly related to the project so I feel like they should be addressed but I intend do so in a separate post. For now I bid you farewell and thank you for reading.
Cheers,
Abraham.
No hay comentarios:
Publicar un comentario