Logotipo de HISPANA
Logotipo del Ministerio de Cultura
  • ¿Qué es Hispana?
  • Búsqueda
  • Directorio de colecciones
  • Contacto
  • es
    • Español
    • Euskara
    • English
    • Galego
    • Català
    • Valencià
Está en:  › Datos de registro
Linked Open Data
A light method for data generation: a combination of Markov Chains and Word Embeddings.
Identificadores del recurso
1135-5948
http://hdl.handle.net/10641/2327
10.26342/2020-64-10
Procedencia
(Repositorio Institucional de la Universidad Francisco de Vitoria)

Ficha

Título:
A light method for data generation: a combination of Markov Chains and Word Embeddings.
Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.
Tema:
Generation
Hybrid
Markov Chains
Embeddings
Similarity
Descripción:
Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.
post-print
1,74 MB
Idioma:
English
Relación:
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199
Autor/Productor:
Martínez García, Eva
Nogales Moyano, Alberto
Morales Escudero, Javier
García Tejedor, Álvaro José
Editor:
Procesamiento del Lenguaje Natural
Derechos:
Atribución-NoComercial-SinDerivadas 3.0 España
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
openAccess
Fecha:
2021-06-16T08:19:29Z
2020
Tipo de recurso:
article

oai_dc

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

    1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

    2. <dc:title>Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dc:title>

    3. <dc:creator>Martínez García, Eva</dc:creator>

    4. <dc:creator>Nogales Moyano, Alberto</dc:creator>

    5. <dc:creator>Morales Escudero, Javier</dc:creator>

    6. <dc:creator>García Tejedor, Álvaro José</dc:creator>

    7. <dc:subject>Generation</dc:subject>

    8. <dc:subject>Hybrid</dc:subject>

    9. <dc:subject>Markov Chains</dc:subject>

    10. <dc:subject>Embeddings</dc:subject>

    11. <dc:subject>Similarity</dc:subject>

    12. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

    13. <dc:description>post-print</dc:description>

    14. <dc:description>1,74 MB</dc:description>

    15. <dc:date>2021-06-16T08:19:29Z</dc:date>

    16. <dc:date>2021-06-16T08:19:29Z</dc:date>

    17. <dc:date>2020</dc:date>

    18. <dc:type>article</dc:type>

    19. <dc:identifier>1135-5948</dc:identifier>

    20. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

    21. <dc:identifier>10.26342/2020-64-10</dc:identifier>

    22. <dc:language>eng</dc:language>

    23. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

    24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

    25. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

    26. <dc:rights>openAccess</dc:rights>

    27. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

    </oai_dc:dc>

didl

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <d:DIDL schemaLocation="urn:mpeg:mpeg21:2002:02-DIDL-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/did/didl.xsd">

    1. <d:DIDLInfo>

      1. <dcterms:created schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd">2021-06-16T08:19:29Z</dcterms:created>

      </d:DIDLInfo>

    2. <d:Item id="hdl_10641_2327">

      1. <d:Descriptor>

        1. <d:Statement mimeType="application/xml; charset=utf-8">

          1. <dii:Identifier schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd">urn:hdl:10641/2327</dii:Identifier>

          </d:Statement>

        </d:Descriptor>

      2. <d:Descriptor>

        1. <d:Statement mimeType="application/xml; charset=utf-8">

          1. <oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

            1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

            2. <dc:creator>Martínez García, Eva</dc:creator>

            3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

            4. <dc:creator>Morales Escudero, Javier</dc:creator>

            5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

            6. <dc:subject>Generation</dc:subject>

            7. <dc:subject>Hybrid</dc:subject>

            8. <dc:subject>Markov Chains</dc:subject>

            9. <dc:subject>Embeddings</dc:subject>

            10. <dc:subject>Similarity</dc:subject>

            11. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

            12. <dc:date>2021-06-16T08:19:29Z</dc:date>

            13. <dc:date>2021-06-16T08:19:29Z</dc:date>

            14. <dc:date>2020</dc:date>

            15. <dc:type>article</dc:type>

            16. <dc:identifier>1135-5948</dc:identifier>

            17. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

            18. <dc:identifier>10.26342/2020-64-10</dc:identifier>

            19. <dc:language>eng</dc:language>

            20. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

            21. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

            22. <dc:rights>openAccess</dc:rights>

            23. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

            24. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

            </oai_dc:dc>

          </d:Statement>

        </d:Descriptor>

      3. <d:Component id="10641_2327_1">

        1. <d:Resource mimeType="application/pdf" ref="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" />

        </d:Component>

      </d:Item>

    </d:DIDL>

dim

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <dim:dim schemaLocation="http://www.dspace.org/xmlns/dspace/dim http://www.dspace.org/schema/dim.xsd">

    1. <dim:field authority="e4a4adb9-fa58-4a27-b114-f07bbf623ff7" confidence="600" element="contributor" mdschema="dc" qualifier="author">Martínez García, Eva</dim:field>

    2. <dim:field authority="209" confidence="600" element="contributor" mdschema="dc" qualifier="author">Nogales Moyano, Alberto</dim:field>

    3. <dim:field authority="3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba" confidence="600" element="contributor" mdschema="dc" qualifier="author">Morales Escudero, Javier</dim:field>

    4. <dim:field authority="75" confidence="600" element="contributor" mdschema="dc" qualifier="author">García Tejedor, Álvaro José</dim:field>

    5. <dim:field element="date" mdschema="dc" qualifier="accessioned">2021-06-16T08:19:29Z</dim:field>

    6. <dim:field element="date" mdschema="dc" qualifier="available">2021-06-16T08:19:29Z</dim:field>

    7. <dim:field element="date" mdschema="dc" qualifier="issued">2020</dim:field>

    8. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="issn">1135-5948</dim:field>

    9. <dim:field element="identifier" mdschema="dc" qualifier="uri">http://hdl.handle.net/10641/2327</dim:field>

    10. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="doi">10.26342/2020-64-10</dim:field>

    11. <dim:field element="description" lang="spa" mdschema="dc" qualifier="abstract">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dim:field>

    12. <dim:field element="description" lang="spa" mdschema="dc" qualifier="version">post-print</dim:field>

    13. <dim:field element="description" lang="spa" mdschema="dc" qualifier="extent">1,74 MB</dim:field>

    14. <dim:field element="language" lang="spa" mdschema="dc" qualifier="iso">eng</dim:field>

    15. <dim:field element="publisher" lang="spa" mdschema="dc">Procesamiento del Lenguaje Natural</dim:field>

    16. <dim:field element="rights" lang="*" mdschema="dc">Atribución-NoComercial-SinDerivadas 3.0 España</dim:field>

    17. <dim:field element="rights" lang="*" mdschema="dc" qualifier="uri">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dim:field>

    18. <dim:field element="rights" lang="spa" mdschema="dc" qualifier="accessRights">openAccess</dim:field>

    19. <dim:field element="subject" lang="spa" mdschema="dc">Generation</dim:field>

    20. <dim:field element="subject" lang="spa" mdschema="dc">Hybrid</dim:field>

    21. <dim:field element="subject" lang="spa" mdschema="dc">Markov Chains</dim:field>

    22. <dim:field element="subject" lang="spa" mdschema="dc">Embeddings</dim:field>

    23. <dim:field element="subject" lang="spa" mdschema="dc">Similarity</dim:field>

    24. <dim:field element="title" lang="spa" mdschema="dc">A light method for data generation: a combination of Markov Chains and Word Embeddings.</dim:field>

    25. <dim:field element="title" lang="spa" mdschema="dc" qualifier="alternative">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dim:field>

    26. <dim:field element="type" lang="spa" mdschema="dc">article</dim:field>

    27. <dim:field element="relation" lang="spa" mdschema="dc" qualifier="publisherversion">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dim:field>

    </dim:dim>

etdms

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <thesis schemaLocation="http://www.ndltd.org/standards/metadata/etdms/1.0/ http://www.ndltd.org/standards/metadata/etdms/1.0/etdms.xsd">

    1. <title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</title>

    2. <creator>Martínez García, Eva</creator>

    3. <creator>Nogales Moyano, Alberto</creator>

    4. <creator>Morales Escudero, Javier</creator>

    5. <creator>García Tejedor, Álvaro José</creator>

    6. <subject>Generation</subject>

    7. <subject>Hybrid</subject>

    8. <subject>Markov Chains</subject>

    9. <subject>Embeddings</subject>

    10. <subject>Similarity</subject>

    11. <description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</description>

    12. <date>2021-06-16</date>

    13. <date>2021-06-16</date>

    14. <date>2020</date>

    15. <type>article</type>

    16. <identifier>1135-5948</identifier>

    17. <identifier>http://hdl.handle.net/10641/2327</identifier>

    18. <identifier>10.26342/2020-64-10</identifier>

    19. <language>eng</language>

    20. <relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</relation>

    21. <rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</rights>

    22. <rights>openAccess</rights>

    23. <rights>Atribución-NoComercial-SinDerivadas 3.0 España</rights>

    24. <publisher>Procesamiento del Lenguaje Natural</publisher>

    </thesis>

marc

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <record schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">

    1. <leader>00925njm 22002777a 4500</leader>

    2. <datafield ind1=" " ind2=" " tag="042">

      1. <subfield code="a">dc</subfield>

      </datafield>

    3. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Martínez García, Eva</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    4. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Nogales Moyano, Alberto</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    5. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Morales Escudero, Javier</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    6. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">García Tejedor, Álvaro José</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    7. <datafield ind1=" " ind2=" " tag="260">

      1. <subfield code="c">2020</subfield>

      </datafield>

    8. <datafield ind1=" " ind2=" " tag="520">

      1. <subfield code="a">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</subfield>

      </datafield>

    9. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">1135-5948</subfield>

      </datafield>

    10. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">http://hdl.handle.net/10641/2327</subfield>

      </datafield>

    11. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">10.26342/2020-64-10</subfield>

      </datafield>

    12. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Generation</subfield>

      </datafield>

    13. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Hybrid</subfield>

      </datafield>

    14. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Markov Chains</subfield>

      </datafield>

    15. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Embeddings</subfield>

      </datafield>

    16. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Similarity</subfield>

      </datafield>

    17. <datafield ind1="0" ind2="0" tag="245">

      1. <subfield code="a">A light method for data generation: a combination of Markov Chains and Word Embeddings.</subfield>

      </datafield>

    </record>

mets

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <mets ID=" DSpace_ITEM_10641-2327" OBJID=" hdl:10641/2327" PROFILE="DSpace METS SIP Profile 1.0" TYPE="DSpace ITEM" schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd">

    1. <metsHdr CREATEDATE="2022-09-20T09:27:37Z">

      1. <agent ROLE="CUSTODIAN" TYPE="ORGANIZATION">

        1. <name>DDFV</name>

        </agent>

      </metsHdr>

    2. <dmdSec ID="DMD_10641_2327">

      1. <mdWrap MDTYPE="MODS">

        1. <xmlData schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

          1. <mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

            1. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Martínez García, Eva</mods:namePart>

              </mods:name>

            2. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Nogales Moyano, Alberto</mods:namePart>

              </mods:name>

            3. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Morales Escudero, Javier</mods:namePart>

              </mods:name>

            4. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>García Tejedor, Álvaro José</mods:namePart>

              </mods:name>

            5. <mods:extension>

              1. <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>

              </mods:extension>

            6. <mods:extension>

              1. <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>

              </mods:extension>

            7. <mods:originInfo>

              1. <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>

              </mods:originInfo>

            8. <mods:identifier type="issn">1135-5948</mods:identifier>

            9. <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>

            10. <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>

            11. <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>

            12. <mods:language>

              1. <mods:languageTerm authority="rfc3066">eng</mods:languageTerm>

              </mods:language>

            13. <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>

            14. <mods:subject>

              1. <mods:topic>Generation</mods:topic>

              </mods:subject>

            15. <mods:subject>

              1. <mods:topic>Hybrid</mods:topic>

              </mods:subject>

            16. <mods:subject>

              1. <mods:topic>Markov Chains</mods:topic>

              </mods:subject>

            17. <mods:subject>

              1. <mods:topic>Embeddings</mods:topic>

              </mods:subject>

            18. <mods:subject>

              1. <mods:topic>Similarity</mods:topic>

              </mods:subject>

            19. <mods:titleInfo>

              1. <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>

              </mods:titleInfo>

            20. <mods:genre>article</mods:genre>

            </mods:mods>

          </xmlData>

        </mdWrap>

      </dmdSec>

    3. <amdSec ID="TMD_10641_2327">

      1. <rightsMD ID="RIG_10641_2327">

        1. <mdWrap MDTYPE="OTHER" MIMETYPE="text/plain" OTHERMDTYPE="DSpaceDepositLicense">

          1. <binData>LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</binData>

          </mdWrap>

        </rightsMD>

      </amdSec>

    4. <amdSec ID="FO_10641_2327_1">

      1. <techMD ID="TECH_O_10641_2327_1">

        1. <mdWrap MDTYPE="PREMIS">

          1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">

            1. <premis:premis>

              1. <premis:object>

                1. <premis:objectIdentifier>

                  1. <premis:objectIdentifierType>URL</premis:objectIdentifierType>

                  2. <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</premis:objectIdentifierValue>

                  </premis:objectIdentifier>

                2. <premis:objectCategory>File</premis:objectCategory>

                3. <premis:objectCharacteristics>

                  1. <premis:fixity>

                    1. <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>

                    2. <premis:messageDigest>81f55f83adefa95b0a46222d72223778</premis:messageDigest>

                    </premis:fixity>

                  2. <premis:size>1831204</premis:size>

                  3. <premis:format>

                    1. <premis:formatDesignation>

                      1. <premis:formatName>application/pdf</premis:formatName>

                      </premis:formatDesignation>

                    </premis:format>

                  </premis:objectCharacteristics>

                4. <premis:originalName>6199-5608-1-PB.pdf</premis:originalName>

                </premis:object>

              </premis:premis>

            </xmlData>

          </mdWrap>

        </techMD>

      </amdSec>

    5. <amdSec ID="FT_10641_2327_4">

      1. <techMD ID="TECH_T_10641_2327_4">

        1. <mdWrap MDTYPE="PREMIS">

          1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">

            1. <premis:premis>

              1. <premis:object>

                1. <premis:objectIdentifier>

                  1. <premis:objectIdentifierType>URL</premis:objectIdentifierType>

                  2. <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</premis:objectIdentifierValue>

                  </premis:objectIdentifier>

                2. <premis:objectCategory>File</premis:objectCategory>

                3. <premis:objectCharacteristics>

                  1. <premis:fixity>

                    1. <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>

                    2. <premis:messageDigest>47b47b4ab230e10b1abda13a3bf7be5e</premis:messageDigest>

                    </premis:fixity>

                  2. <premis:size>30680</premis:size>

                  3. <premis:format>

                    1. <premis:formatDesignation>

                      1. <premis:formatName>text/plain</premis:formatName>

                      </premis:formatDesignation>

                    </premis:format>

                  </premis:objectCharacteristics>

                4. <premis:originalName>6199-5608-1-PB.pdf.txt</premis:originalName>

                </premis:object>

              </premis:premis>

            </xmlData>

          </mdWrap>

        </techMD>

      </amdSec>

    6. <fileSec>

      1. <fileGrp USE="ORIGINAL">

        1. <file ADMID="FO_10641_2327_1" CHECKSUM="81f55f83adefa95b0a46222d72223778" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_1" ID="BITSTREAM_ORIGINAL_10641_2327_1" MIMETYPE="application/pdf" SEQ="1" SIZE="1831204">

          1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" type="simple" />

          </file>

        </fileGrp>

      2. <fileGrp USE="TEXT">

        1. <file ADMID="FT_10641_2327_4" CHECKSUM="47b47b4ab230e10b1abda13a3bf7be5e" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_4" ID="BITSTREAM_TEXT_10641_2327_4" MIMETYPE="text/plain" SEQ="4" SIZE="30680">

          1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt" type="simple" />

          </file>

        </fileGrp>

      </fileSec>

    7. <structMap LABEL="DSpace Object" TYPE="LOGICAL">

      1. <div ADMID="DMD_10641_2327" TYPE="DSpace Object Contents">

        1. <div TYPE="DSpace BITSTREAM">

          1. <fptr FILEID="BITSTREAM_ORIGINAL_10641_2327_1" />

          </div>

        </div>

      </structMap>

    </mets>

mods

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

    1. <mods:name>

      1. <mods:namePart>Martínez García, Eva</mods:namePart>

      </mods:name>

    2. <mods:name>

      1. <mods:namePart>Nogales Moyano, Alberto</mods:namePart>

      </mods:name>

    3. <mods:name>

      1. <mods:namePart>Morales Escudero, Javier</mods:namePart>

      </mods:name>

    4. <mods:name>

      1. <mods:namePart>García Tejedor, Álvaro José</mods:namePart>

      </mods:name>

    5. <mods:extension>

      1. <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>

      </mods:extension>

    6. <mods:extension>

      1. <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>

      </mods:extension>

    7. <mods:originInfo>

      1. <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>

      </mods:originInfo>

    8. <mods:identifier type="issn">1135-5948</mods:identifier>

    9. <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>

    10. <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>

    11. <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>

    12. <mods:language>

      1. <mods:languageTerm>eng</mods:languageTerm>

      </mods:language>

    13. <mods:accessCondition type="useAndReproduction">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</mods:accessCondition>

    14. <mods:accessCondition type="useAndReproduction">openAccess</mods:accessCondition>

    15. <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>

    16. <mods:subject>

      1. <mods:topic>Generation</mods:topic>

      </mods:subject>

    17. <mods:subject>

      1. <mods:topic>Hybrid</mods:topic>

      </mods:subject>

    18. <mods:subject>

      1. <mods:topic>Markov Chains</mods:topic>

      </mods:subject>

    19. <mods:subject>

      1. <mods:topic>Embeddings</mods:topic>

      </mods:subject>

    20. <mods:subject>

      1. <mods:topic>Similarity</mods:topic>

      </mods:subject>

    21. <mods:titleInfo>

      1. <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>

      </mods:titleInfo>

    22. <mods:genre>article</mods:genre>

    </mods:mods>

ore

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <atom:entry schemaLocation="http://www.w3.org/2005/Atom http://www.kbcafe.com/rss/atom.xsd.xml">

    1. <atom:id>http://hdl.handle.net/10641/2327/ore.xml</atom:id>

    2. <atom:link href="http://hdl.handle.net/10641/2327" rel="alternate" />
    3. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml" rel="http://www.openarchives.org/ore/terms/describes" />
    4. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml#atom" rel="self" type="application/atom+xml" />
    5. <atom:published>2021-06-16T08:19:29Z</atom:published>

    6. <atom:updated>2021-06-16T08:19:29Z</atom:updated>

    7. <atom:source>

      1. <atom:generator>DDFV</atom:generator>

      </atom:source>

    8. <atom:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</atom:title>

    9. <atom:author>

      1. <atom:name>Martínez García, Eva</atom:name>

      </atom:author>

    10. <atom:author>

      1. <atom:name>Nogales Moyano, Alberto</atom:name>

      </atom:author>

    11. <atom:author>

      1. <atom:name>Morales Escudero, Javier</atom:name>

      </atom:author>

    12. <atom:author>

      1. <atom:name>García Tejedor, Álvaro José</atom:name>

      </atom:author>

    13. <atom:category label="Aggregation" scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" />
    14. <atom:category scheme="http://www.openarchives.org/ore/atom/modified" term="2021-06-16T08:19:29Z" />
    15. <atom:category label="DSpace Item" scheme="http://www.dspace.org/objectModel/" term="DSpaceItem" />
    16. <atom:link href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" length="1831204" rel="http://www.openarchives.org/ore/terms/aggregates" title="6199-5608-1-PB.pdf" type="application/pdf" />
    17. <oreatom:triples>

      1. <rdf:Description about="http://hdl.handle.net/10641/2327/ore.xml#atom">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceItem" />
        2. <dcterms:modified>2021-06-16T08:19:29Z</dcterms:modified>

        </rdf:Description>

      2. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>ORIGINAL</dcterms:description>

        </rdf:Description>

      3. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>CC-LICENSE</dcterms:description>

        </rdf:Description>

      4. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>LICENSE</dcterms:description>

        </rdf:Description>

      5. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>TEXT</dcterms:description>

        </rdf:Description>

      6. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>THUMBNAIL</dcterms:description>

        </rdf:Description>

      </oreatom:triples>

    </atom:entry>

qdc

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <qdc:qualifieddc schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">

    1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

    2. <dc:creator>Martínez García, Eva</dc:creator>

    3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

    4. <dc:creator>Morales Escudero, Javier</dc:creator>

    5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

    6. <dc:subject>Generation</dc:subject>

    7. <dc:subject>Hybrid</dc:subject>

    8. <dc:subject>Markov Chains</dc:subject>

    9. <dc:subject>Embeddings</dc:subject>

    10. <dc:subject>Similarity</dc:subject>

    11. <dcterms:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dcterms:abstract>

    12. <dcterms:dateAccepted>2021-06-16T08:19:29Z</dcterms:dateAccepted>

    13. <dcterms:available>2021-06-16T08:19:29Z</dcterms:available>

    14. <dcterms:created>2021-06-16T08:19:29Z</dcterms:created>

    15. <dcterms:issued>2020</dcterms:issued>

    16. <dc:type>article</dc:type>

    17. <dc:identifier>1135-5948</dc:identifier>

    18. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

    19. <dc:identifier>10.26342/2020-64-10</dc:identifier>

    20. <dc:language>eng</dc:language>

    21. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

    22. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

    23. <dc:rights>openAccess</dc:rights>

    24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

    25. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

    </qdc:qualifieddc>

rdf

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <rdf:RDF schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">

    1. <ow:Publication about="oai:ddfv.ufv.es:10641/2327">

      1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

      2. <dc:creator>Martínez García, Eva</dc:creator>

      3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

      4. <dc:creator>Morales Escudero, Javier</dc:creator>

      5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

      6. <dc:subject>Generation</dc:subject>

      7. <dc:subject>Hybrid</dc:subject>

      8. <dc:subject>Markov Chains</dc:subject>

      9. <dc:subject>Embeddings</dc:subject>

      10. <dc:subject>Similarity</dc:subject>

      11. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

      12. <dc:date>2021-06-16T08:19:29Z</dc:date>

      13. <dc:date>2021-06-16T08:19:29Z</dc:date>

      14. <dc:date>2020</dc:date>

      15. <dc:type>article</dc:type>

      16. <dc:identifier>1135-5948</dc:identifier>

      17. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

      18. <dc:identifier>10.26342/2020-64-10</dc:identifier>

      19. <dc:language>eng</dc:language>

      20. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

      21. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

      22. <dc:rights>openAccess</dc:rights>

      23. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

      24. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

      </ow:Publication>

    </rdf:RDF>

xoai

Descargar XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <metadata schemaLocation="http://www.lyncode.com/xoai http://www.lyncode.com/xsd/xoai.xsd">

    1. <element name="dc">

      1. <element name="contributor">

        1. <element name="author">

          1. <element name="none">

            1. <field name="value">Martínez García, Eva</field>

            2. <field name="authority">e4a4adb9-fa58-4a27-b114-f07bbf623ff7</field>

            3. <field name="confidence">600</field>

            4. <field name="value">Nogales Moyano, Alberto</field>

            5. <field name="authority">209</field>

            6. <field name="confidence">600</field>

            7. <field name="value">Morales Escudero, Javier</field>

            8. <field name="authority">3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba</field>

            9. <field name="confidence">600</field>

            10. <field name="value">García Tejedor, Álvaro José</field>

            11. <field name="authority">75</field>

            12. <field name="confidence">600</field>

            </element>

          </element>

        </element>

      2. <element name="date">

        1. <element name="accessioned">

          1. <element name="none">

            1. <field name="value">2021-06-16T08:19:29Z</field>

            </element>

          </element>

        2. <element name="available">

          1. <element name="none">

            1. <field name="value">2021-06-16T08:19:29Z</field>

            </element>

          </element>

        3. <element name="issued">

          1. <element name="none">

            1. <field name="value">2020</field>

            </element>

          </element>

        </element>

      3. <element name="identifier">

        1. <element name="issn">

          1. <element name="spa">

            1. <field name="value">1135-5948</field>

            </element>

          </element>

        2. <element name="uri">

          1. <element name="none">

            1. <field name="value">http://hdl.handle.net/10641/2327</field>

            </element>

          </element>

        3. <element name="doi">

          1. <element name="spa">

            1. <field name="value">10.26342/2020-64-10</field>

            </element>

          </element>

        </element>

      4. <element name="description">

        1. <element name="abstract">

          1. <element name="spa">

            1. <field name="value">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</field>

            </element>

          </element>

        2. <element name="version">

          1. <element name="spa">

            1. <field name="value">post-print</field>

            </element>

          </element>

        3. <element name="extent">

          1. <element name="spa">

            1. <field name="value">1,74 MB</field>

            </element>

          </element>

        </element>

      5. <element name="language">

        1. <element name="iso">

          1. <element name="spa">

            1. <field name="value">eng</field>

            </element>

          </element>

        </element>

      6. <element name="publisher">

        1. <element name="spa">

          1. <field name="value">Procesamiento del Lenguaje Natural</field>

          </element>

        </element>

      7. <element name="rights">

        1. <element name="*">

          1. <field name="value">Atribución-NoComercial-SinDerivadas 3.0 España</field>

          </element>

        2. <element name="uri">

          1. <element name="*">

            1. <field name="value">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</field>

            </element>

          </element>

        3. <element name="accessRights">

          1. <element name="spa">

            1. <field name="value">openAccess</field>

            </element>

          </element>

        </element>

      8. <element name="subject">

        1. <element name="spa">

          1. <field name="value">Generation</field>

          2. <field name="value">Hybrid</field>

          3. <field name="value">Markov Chains</field>

          4. <field name="value">Embeddings</field>

          5. <field name="value">Similarity</field>

          </element>

        </element>

      9. <element name="title">

        1. <element name="spa">

          1. <field name="value">A light method for data generation: a combination of Markov Chains and Word Embeddings.</field>

          </element>

        2. <element name="alternative">

          1. <element name="spa">

            1. <field name="value">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</field>

            </element>

          </element>

        </element>

      10. <element name="type">

        1. <element name="spa">

          1. <field name="value">article</field>

          </element>

        </element>

      11. <element name="relation">

        1. <element name="publisherversion">

          1. <element name="spa">

            1. <field name="value">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</field>

            </element>

          </element>

        </element>

      </element>

    2. <element name="bundles">

      1. <element name="bundle">

        1. <field name="name">ORIGINAL</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf</field>

            2. <field name="originalName">6199-5608-1-PB.pdf</field>

            3. <field name="description" />
            4. <field name="format">application/pdf</field>

            5. <field name="size">1831204</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</field>

            7. <field name="checksum">81f55f83adefa95b0a46222d72223778</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">1</field>

            </element>

          </element>

        </element>

      2. <element name="bundle">

        1. <field name="name">CC-LICENSE</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">license_rdf</field>

            2. <field name="originalName">license_rdf</field>

            3. <field name="format">application/rdf+xml; charset=utf-8</field>

            4. <field name="size">811</field>

            5. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf</field>

            6. <field name="checksum">4d01a8abc68801ab758ec8c2c04918c3</field>

            7. <field name="checksumAlgorithm">MD5</field>

            8. <field name="sid">2</field>

            </element>

          </element>

        </element>

      3. <element name="bundle">

        1. <field name="name">LICENSE</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">license.txt</field>

            2. <field name="originalName">license.txt</field>

            3. <field name="format">text/plain; charset=utf-8</field>

            4. <field name="size">2418</field>

            5. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt</field>

            6. <field name="checksum">8b6e3a0bc6a1ca51936267b0e6e4740c</field>

            7. <field name="checksumAlgorithm">MD5</field>

            8. <field name="sid">3</field>

            </element>

          </element>

        </element>

      4. <element name="bundle">

        1. <field name="name">TEXT</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf.txt</field>

            2. <field name="originalName">6199-5608-1-PB.pdf.txt</field>

            3. <field name="description">Extracted text</field>

            4. <field name="format">text/plain</field>

            5. <field name="size">30680</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</field>

            7. <field name="checksum">47b47b4ab230e10b1abda13a3bf7be5e</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">4</field>

            </element>

          </element>

        </element>

      5. <element name="bundle">

        1. <field name="name">THUMBNAIL</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf.jpg</field>

            2. <field name="originalName">6199-5608-1-PB.pdf.jpg</field>

            3. <field name="description">Generated Thumbnail</field>

            4. <field name="format">image/jpeg</field>

            5. <field name="size">1595</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg</field>

            7. <field name="checksum">edb12135decccbd5135dbda40a8589ad</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">5</field>

            </element>

          </element>

        </element>

      </element>

    3. <element name="others">

      1. <field name="handle">10641/2327</field>

      2. <field name="identifier">oai:ddfv.ufv.es:10641/2327</field>

      3. <field name="lastModifyDate">2022-01-27 09:59:54.429</field>

      </element>

    4. <element name="repository">

      1. <field name="name">DDFV</field>

      2. <field name="mail">dspace@ufv.es</field>

      </element>

    5. <element name="license">

      1. <field name="bin">LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</field>

      </element>

    </metadata>

Hispana

Portal de acceso al patrimonio digital y el agregador nacional de contenidos a Europeana.

Contacto

Accede a nuestro formulario y te contestaremos con la mayor brevedad.

Contacto

X

Tweets by Hispana_roai

Facebook

HISPANA
© Ministerio de Cultura
  • Aviso Legal