Non penso que isto vaia mellorar. Pero como non loitemos, farase peor.
SentenceSplitter.py: Python module to split text chunks into ortographic sentences, version 1.6. The code is at the bottom of this page.
SentenceSplitter.py: Módulo de Python pra partir textos en frases ortográficas, vesión 1.6. Tedes o código ó final da páxina.
Contén unha clase SentenceSplitter que parte parágrafos en frases empregando unha expresión regular para a detección e discriminación de marcas de puntuación, así como unha lista de abreviaturas que non deberían de disparar o particionado.
Básicamente, é unha modificación do SentenceSplitter.py orixinal que Mickel Grönroos escribiu no 2004. Incorpora algunhas melloras:
...Por lle chamar dalgún xeito.
Python 2.4.4 (#2, Oct 20 2006, 00:57:46)
[GCC 4.1.2 20061007 (prerelease) (Debian 4.1.1-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import SentenceSplitter
>>> ss = SentenceSplitter.SentenceSplitter()
>>> ss.setAbbreviations(['Mr.', 'Sr.'])
>>> ss.getCapState()
False
>>> ss.switchCapState()
>>> ss.split("'No!', said Mr. Jones B. Smith when the reactor reached 99.9 percent of its capacity. Desperatedly, he struck the controls, trying to turn off the switch. Could they be saved? Most probaby not. But one never knows! Perhaps some miracle could come in their aid.")
["'No!', said Mr. Jones B. Smith when the reactor reached 99.9 percent of its capacity.", 'Desperatedly, he struck the controls, trying to turn off the switch.', 'Could they be saved?', 'Most probaby not.', 'But one never knows!', 'Perhaps some miracle could come in their aid.']
>>> ss.split("Mrs. Whiteley had a very strange son.")
['Mrs.', 'Whiteley had a very strange son.']
>>> abbreviations = ss.getAbbreviations()
>>> abbreviations += ['Mrs.']
>>> abbreviations
['Mr.', 'Sr.', 'A.', 'B.', 'C.', 'D.', 'E.', 'F.', 'G.', 'H.', 'I.', 'J.', 'K.', 'L.', 'M.', 'N.', 'O.', 'P.', 'Q.', 'R.', 'S.', 'T.', 'U.', 'V.', 'W.', 'X.', 'Y.', 'Z.', 'Mrs.']
>>> ss.setAbbreviations(abbreviations)
>>> ss.split("Mrs. Whiteley had a very strange son.")
['Mrs. Whiteley had a very strange son.']
>>> help(SentenceSplitter)
| Adxunto | Tamaño |
|---|---|
| SentenceSplitter.py.txt | 9.85 KB |
|
|
|
|
|
|
|
|
|
|