Logotipo de HISPANA
Logotipo del Ministerio de Cultura y Deporte
  • WHAT IS HISPANA?
  • Search
  • DIRECTORY OF COLLECTIONS
  • Contact
  • en
    • Español
    • Euskara
    • English
    • Galego
    • Català
    • Valencià
Está en:  › Record data
Linked Open Data
A light method for data generation: a combination of Markov Chains and Word Embeddings.
Identificadores del recurso
1135-5948
http://hdl.handle.net/10641/2327
10.26342/2020-64-10
Origin
(Repositorio Institucional de la Universidad Francisco de Vitoria)

File

Title:
A light method for data generation: a combination of Markov Chains and Word Embeddings.
Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.
Tema:
Generation
Hybrid
Markov Chains
Embeddings
Similarity
Description:
Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.
post-print
1,74 MB
Idioma:
English
Relation:
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199
Autor/Productor:
Martínez García, Eva
Nogales Moyano, Alberto
Morales Escudero, Javier
García Tejedor, Álvaro José
Publisher:
Procesamiento del Lenguaje Natural
Rights:
Atribución-NoComercial-SinDerivadas 3.0 España
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
openAccess
Date:
2021-06-16T08:19:29Z
2020
Tipo de recurso:
article

oai_dc

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

    1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

    2. <dc:title>Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dc:title>

    3. <dc:creator>Martínez García, Eva</dc:creator>

    4. <dc:creator>Nogales Moyano, Alberto</dc:creator>

    5. <dc:creator>Morales Escudero, Javier</dc:creator>

    6. <dc:creator>García Tejedor, Álvaro José</dc:creator>

    7. <dc:subject>Generation</dc:subject>

    8. <dc:subject>Hybrid</dc:subject>

    9. <dc:subject>Markov Chains</dc:subject>

    10. <dc:subject>Embeddings</dc:subject>

    11. <dc:subject>Similarity</dc:subject>

    12. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

    13. <dc:description>post-print</dc:description>

    14. <dc:description>1,74 MB</dc:description>

    15. <dc:date>2021-06-16T08:19:29Z</dc:date>

    16. <dc:date>2021-06-16T08:19:29Z</dc:date>

    17. <dc:date>2020</dc:date>

    18. <dc:type>article</dc:type>

    19. <dc:identifier>1135-5948</dc:identifier>

    20. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

    21. <dc:identifier>10.26342/2020-64-10</dc:identifier>

    22. <dc:language>eng</dc:language>

    23. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

    24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

    25. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

    26. <dc:rights>openAccess</dc:rights>

    27. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

    </oai_dc:dc>

didl

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <d:DIDL schemaLocation="urn:mpeg:mpeg21:2002:02-DIDL-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/did/didl.xsd">

    1. <d:DIDLInfo>

      1. <dcterms:created schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd">2021-06-16T08:19:29Z</dcterms:created>

      </d:DIDLInfo>

    2. <d:Item id="hdl_10641_2327">

      1. <d:Descriptor>

        1. <d:Statement mimeType="application/xml; charset=utf-8">

          1. <dii:Identifier schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd">urn:hdl:10641/2327</dii:Identifier>

          </d:Statement>

        </d:Descriptor>

      2. <d:Descriptor>

        1. <d:Statement mimeType="application/xml; charset=utf-8">

          1. <oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

            1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

            2. <dc:creator>Martínez García, Eva</dc:creator>

            3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

            4. <dc:creator>Morales Escudero, Javier</dc:creator>

            5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

            6. <dc:subject>Generation</dc:subject>

            7. <dc:subject>Hybrid</dc:subject>

            8. <dc:subject>Markov Chains</dc:subject>

            9. <dc:subject>Embeddings</dc:subject>

            10. <dc:subject>Similarity</dc:subject>

            11. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

            12. <dc:date>2021-06-16T08:19:29Z</dc:date>

            13. <dc:date>2021-06-16T08:19:29Z</dc:date>

            14. <dc:date>2020</dc:date>

            15. <dc:type>article</dc:type>

            16. <dc:identifier>1135-5948</dc:identifier>

            17. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

            18. <dc:identifier>10.26342/2020-64-10</dc:identifier>

            19. <dc:language>eng</dc:language>

            20. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

            21. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

            22. <dc:rights>openAccess</dc:rights>

            23. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

            24. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

            </oai_dc:dc>

          </d:Statement>

        </d:Descriptor>

      3. <d:Component id="10641_2327_1">

        1. <d:Resource mimeType="application/pdf" ref="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" />

        </d:Component>

      </d:Item>

    </d:DIDL>

dim

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <dim:dim schemaLocation="http://www.dspace.org/xmlns/dspace/dim http://www.dspace.org/schema/dim.xsd">

    1. <dim:field authority="e4a4adb9-fa58-4a27-b114-f07bbf623ff7" confidence="600" element="contributor" mdschema="dc" qualifier="author">Martínez García, Eva</dim:field>

    2. <dim:field authority="209" confidence="600" element="contributor" mdschema="dc" qualifier="author">Nogales Moyano, Alberto</dim:field>

    3. <dim:field authority="3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba" confidence="600" element="contributor" mdschema="dc" qualifier="author">Morales Escudero, Javier</dim:field>

    4. <dim:field authority="75" confidence="600" element="contributor" mdschema="dc" qualifier="author">García Tejedor, Álvaro José</dim:field>

    5. <dim:field element="date" mdschema="dc" qualifier="accessioned">2021-06-16T08:19:29Z</dim:field>

    6. <dim:field element="date" mdschema="dc" qualifier="available">2021-06-16T08:19:29Z</dim:field>

    7. <dim:field element="date" mdschema="dc" qualifier="issued">2020</dim:field>

    8. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="issn">1135-5948</dim:field>

    9. <dim:field element="identifier" mdschema="dc" qualifier="uri">http://hdl.handle.net/10641/2327</dim:field>

    10. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="doi">10.26342/2020-64-10</dim:field>

    11. <dim:field element="description" lang="spa" mdschema="dc" qualifier="abstract">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dim:field>

    12. <dim:field element="description" lang="spa" mdschema="dc" qualifier="version">post-print</dim:field>

    13. <dim:field element="description" lang="spa" mdschema="dc" qualifier="extent">1,74 MB</dim:field>

    14. <dim:field element="language" lang="spa" mdschema="dc" qualifier="iso">eng</dim:field>

    15. <dim:field element="publisher" lang="spa" mdschema="dc">Procesamiento del Lenguaje Natural</dim:field>

    16. <dim:field element="rights" lang="*" mdschema="dc">Atribución-NoComercial-SinDerivadas 3.0 España</dim:field>

    17. <dim:field element="rights" lang="*" mdschema="dc" qualifier="uri">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dim:field>

    18. <dim:field element="rights" lang="spa" mdschema="dc" qualifier="accessRights">openAccess</dim:field>

    19. <dim:field element="subject" lang="spa" mdschema="dc">Generation</dim:field>

    20. <dim:field element="subject" lang="spa" mdschema="dc">Hybrid</dim:field>

    21. <dim:field element="subject" lang="spa" mdschema="dc">Markov Chains</dim:field>

    22. <dim:field element="subject" lang="spa" mdschema="dc">Embeddings</dim:field>

    23. <dim:field element="subject" lang="spa" mdschema="dc">Similarity</dim:field>

    24. <dim:field element="title" lang="spa" mdschema="dc">A light method for data generation: a combination of Markov Chains and Word Embeddings.</dim:field>

    25. <dim:field element="title" lang="spa" mdschema="dc" qualifier="alternative">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dim:field>

    26. <dim:field element="type" lang="spa" mdschema="dc">article</dim:field>

    27. <dim:field element="relation" lang="spa" mdschema="dc" qualifier="publisherversion">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dim:field>

    </dim:dim>

etdms

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <thesis schemaLocation="http://www.ndltd.org/standards/metadata/etdms/1.0/ http://www.ndltd.org/standards/metadata/etdms/1.0/etdms.xsd">

    1. <title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</title>

    2. <creator>Martínez García, Eva</creator>

    3. <creator>Nogales Moyano, Alberto</creator>

    4. <creator>Morales Escudero, Javier</creator>

    5. <creator>García Tejedor, Álvaro José</creator>

    6. <subject>Generation</subject>

    7. <subject>Hybrid</subject>

    8. <subject>Markov Chains</subject>

    9. <subject>Embeddings</subject>

    10. <subject>Similarity</subject>

    11. <description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</description>

    12. <date>2021-06-16</date>

    13. <date>2021-06-16</date>

    14. <date>2020</date>

    15. <type>article</type>

    16. <identifier>1135-5948</identifier>

    17. <identifier>http://hdl.handle.net/10641/2327</identifier>

    18. <identifier>10.26342/2020-64-10</identifier>

    19. <language>eng</language>

    20. <relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</relation>

    21. <rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</rights>

    22. <rights>openAccess</rights>

    23. <rights>Atribución-NoComercial-SinDerivadas 3.0 España</rights>

    24. <publisher>Procesamiento del Lenguaje Natural</publisher>

    </thesis>

marc

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <record schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">

    1. <leader>00925njm 22002777a 4500</leader>

    2. <datafield ind1=" " ind2=" " tag="042">

      1. <subfield code="a">dc</subfield>

      </datafield>

    3. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Martínez García, Eva</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    4. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Nogales Moyano, Alberto</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    5. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">Morales Escudero, Javier</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    6. <datafield ind1=" " ind2=" " tag="720">

      1. <subfield code="a">García Tejedor, Álvaro José</subfield>

      2. <subfield code="e">author</subfield>

      </datafield>

    7. <datafield ind1=" " ind2=" " tag="260">

      1. <subfield code="c">2020</subfield>

      </datafield>

    8. <datafield ind1=" " ind2=" " tag="520">

      1. <subfield code="a">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</subfield>

      </datafield>

    9. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">1135-5948</subfield>

      </datafield>

    10. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">http://hdl.handle.net/10641/2327</subfield>

      </datafield>

    11. <datafield ind1="8" ind2=" " tag="024">

      1. <subfield code="a">10.26342/2020-64-10</subfield>

      </datafield>

    12. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Generation</subfield>

      </datafield>

    13. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Hybrid</subfield>

      </datafield>

    14. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Markov Chains</subfield>

      </datafield>

    15. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Embeddings</subfield>

      </datafield>

    16. <datafield ind1=" " ind2=" " tag="653">

      1. <subfield code="a">Similarity</subfield>

      </datafield>

    17. <datafield ind1="0" ind2="0" tag="245">

      1. <subfield code="a">A light method for data generation: a combination of Markov Chains and Word Embeddings.</subfield>

      </datafield>

    </record>

mets

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <mets ID=" DSpace_ITEM_10641-2327" OBJID=" hdl:10641/2327" PROFILE="DSpace METS SIP Profile 1.0" TYPE="DSpace ITEM" schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd">

    1. <metsHdr CREATEDATE="2022-09-20T09:27:37Z">

      1. <agent ROLE="CUSTODIAN" TYPE="ORGANIZATION">

        1. <name>DDFV</name>

        </agent>

      </metsHdr>

    2. <dmdSec ID="DMD_10641_2327">

      1. <mdWrap MDTYPE="MODS">

        1. <xmlData schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

          1. <mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

            1. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Martínez García, Eva</mods:namePart>

              </mods:name>

            2. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Nogales Moyano, Alberto</mods:namePart>

              </mods:name>

            3. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>Morales Escudero, Javier</mods:namePart>

              </mods:name>

            4. <mods:name>

              1. <mods:role>

                1. <mods:roleTerm type="text">author</mods:roleTerm>

                </mods:role>

              2. <mods:namePart>García Tejedor, Álvaro José</mods:namePart>

              </mods:name>

            5. <mods:extension>

              1. <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>

              </mods:extension>

            6. <mods:extension>

              1. <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>

              </mods:extension>

            7. <mods:originInfo>

              1. <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>

              </mods:originInfo>

            8. <mods:identifier type="issn">1135-5948</mods:identifier>

            9. <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>

            10. <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>

            11. <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>

            12. <mods:language>

              1. <mods:languageTerm authority="rfc3066">eng</mods:languageTerm>

              </mods:language>

            13. <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>

            14. <mods:subject>

              1. <mods:topic>Generation</mods:topic>

              </mods:subject>

            15. <mods:subject>

              1. <mods:topic>Hybrid</mods:topic>

              </mods:subject>

            16. <mods:subject>

              1. <mods:topic>Markov Chains</mods:topic>

              </mods:subject>

            17. <mods:subject>

              1. <mods:topic>Embeddings</mods:topic>

              </mods:subject>

            18. <mods:subject>

              1. <mods:topic>Similarity</mods:topic>

              </mods:subject>

            19. <mods:titleInfo>

              1. <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>

              </mods:titleInfo>

            20. <mods:genre>article</mods:genre>

            </mods:mods>

          </xmlData>

        </mdWrap>

      </dmdSec>

    3. <amdSec ID="TMD_10641_2327">

      1. <rightsMD ID="RIG_10641_2327">

        1. <mdWrap MDTYPE="OTHER" MIMETYPE="text/plain" OTHERMDTYPE="DSpaceDepositLicense">

          1. <binData>LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</binData>

          </mdWrap>

        </rightsMD>

      </amdSec>

    4. <amdSec ID="FO_10641_2327_1">

      1. <techMD ID="TECH_O_10641_2327_1">

        1. <mdWrap MDTYPE="PREMIS">

          1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">

            1. <premis:premis>

              1. <premis:object>

                1. <premis:objectIdentifier>

                  1. <premis:objectIdentifierType>URL</premis:objectIdentifierType>

                  2. <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</premis:objectIdentifierValue>

                  </premis:objectIdentifier>

                2. <premis:objectCategory>File</premis:objectCategory>

                3. <premis:objectCharacteristics>

                  1. <premis:fixity>

                    1. <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>

                    2. <premis:messageDigest>81f55f83adefa95b0a46222d72223778</premis:messageDigest>

                    </premis:fixity>

                  2. <premis:size>1831204</premis:size>

                  3. <premis:format>

                    1. <premis:formatDesignation>

                      1. <premis:formatName>application/pdf</premis:formatName>

                      </premis:formatDesignation>

                    </premis:format>

                  </premis:objectCharacteristics>

                4. <premis:originalName>6199-5608-1-PB.pdf</premis:originalName>

                </premis:object>

              </premis:premis>

            </xmlData>

          </mdWrap>

        </techMD>

      </amdSec>

    5. <amdSec ID="FT_10641_2327_4">

      1. <techMD ID="TECH_T_10641_2327_4">

        1. <mdWrap MDTYPE="PREMIS">

          1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">

            1. <premis:premis>

              1. <premis:object>

                1. <premis:objectIdentifier>

                  1. <premis:objectIdentifierType>URL</premis:objectIdentifierType>

                  2. <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</premis:objectIdentifierValue>

                  </premis:objectIdentifier>

                2. <premis:objectCategory>File</premis:objectCategory>

                3. <premis:objectCharacteristics>

                  1. <premis:fixity>

                    1. <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>

                    2. <premis:messageDigest>47b47b4ab230e10b1abda13a3bf7be5e</premis:messageDigest>

                    </premis:fixity>

                  2. <premis:size>30680</premis:size>

                  3. <premis:format>

                    1. <premis:formatDesignation>

                      1. <premis:formatName>text/plain</premis:formatName>

                      </premis:formatDesignation>

                    </premis:format>

                  </premis:objectCharacteristics>

                4. <premis:originalName>6199-5608-1-PB.pdf.txt</premis:originalName>

                </premis:object>

              </premis:premis>

            </xmlData>

          </mdWrap>

        </techMD>

      </amdSec>

    6. <fileSec>

      1. <fileGrp USE="ORIGINAL">

        1. <file ADMID="FO_10641_2327_1" CHECKSUM="81f55f83adefa95b0a46222d72223778" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_1" ID="BITSTREAM_ORIGINAL_10641_2327_1" MIMETYPE="application/pdf" SEQ="1" SIZE="1831204">

          1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" type="simple" />

          </file>

        </fileGrp>

      2. <fileGrp USE="TEXT">

        1. <file ADMID="FT_10641_2327_4" CHECKSUM="47b47b4ab230e10b1abda13a3bf7be5e" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_4" ID="BITSTREAM_TEXT_10641_2327_4" MIMETYPE="text/plain" SEQ="4" SIZE="30680">

          1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt" type="simple" />

          </file>

        </fileGrp>

      </fileSec>

    7. <structMap LABEL="DSpace Object" TYPE="LOGICAL">

      1. <div ADMID="DMD_10641_2327" TYPE="DSpace Object Contents">

        1. <div TYPE="DSpace BITSTREAM">

          1. <fptr FILEID="BITSTREAM_ORIGINAL_10641_2327_1" />

          </div>

        </div>

      </structMap>

    </mets>

mods

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">

    1. <mods:name>

      1. <mods:namePart>Martínez García, Eva</mods:namePart>

      </mods:name>

    2. <mods:name>

      1. <mods:namePart>Nogales Moyano, Alberto</mods:namePart>

      </mods:name>

    3. <mods:name>

      1. <mods:namePart>Morales Escudero, Javier</mods:namePart>

      </mods:name>

    4. <mods:name>

      1. <mods:namePart>García Tejedor, Álvaro José</mods:namePart>

      </mods:name>

    5. <mods:extension>

      1. <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>

      </mods:extension>

    6. <mods:extension>

      1. <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>

      </mods:extension>

    7. <mods:originInfo>

      1. <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>

      </mods:originInfo>

    8. <mods:identifier type="issn">1135-5948</mods:identifier>

    9. <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>

    10. <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>

    11. <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>

    12. <mods:language>

      1. <mods:languageTerm>eng</mods:languageTerm>

      </mods:language>

    13. <mods:accessCondition type="useAndReproduction">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</mods:accessCondition>

    14. <mods:accessCondition type="useAndReproduction">openAccess</mods:accessCondition>

    15. <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>

    16. <mods:subject>

      1. <mods:topic>Generation</mods:topic>

      </mods:subject>

    17. <mods:subject>

      1. <mods:topic>Hybrid</mods:topic>

      </mods:subject>

    18. <mods:subject>

      1. <mods:topic>Markov Chains</mods:topic>

      </mods:subject>

    19. <mods:subject>

      1. <mods:topic>Embeddings</mods:topic>

      </mods:subject>

    20. <mods:subject>

      1. <mods:topic>Similarity</mods:topic>

      </mods:subject>

    21. <mods:titleInfo>

      1. <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>

      </mods:titleInfo>

    22. <mods:genre>article</mods:genre>

    </mods:mods>

ore

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <atom:entry schemaLocation="http://www.w3.org/2005/Atom http://www.kbcafe.com/rss/atom.xsd.xml">

    1. <atom:id>http://hdl.handle.net/10641/2327/ore.xml</atom:id>

    2. <atom:link href="http://hdl.handle.net/10641/2327" rel="alternate" />
    3. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml" rel="http://www.openarchives.org/ore/terms/describes" />
    4. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml#atom" rel="self" type="application/atom+xml" />
    5. <atom:published>2021-06-16T08:19:29Z</atom:published>

    6. <atom:updated>2021-06-16T08:19:29Z</atom:updated>

    7. <atom:source>

      1. <atom:generator>DDFV</atom:generator>

      </atom:source>

    8. <atom:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</atom:title>

    9. <atom:author>

      1. <atom:name>Martínez García, Eva</atom:name>

      </atom:author>

    10. <atom:author>

      1. <atom:name>Nogales Moyano, Alberto</atom:name>

      </atom:author>

    11. <atom:author>

      1. <atom:name>Morales Escudero, Javier</atom:name>

      </atom:author>

    12. <atom:author>

      1. <atom:name>García Tejedor, Álvaro José</atom:name>

      </atom:author>

    13. <atom:category label="Aggregation" scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" />
    14. <atom:category scheme="http://www.openarchives.org/ore/atom/modified" term="2021-06-16T08:19:29Z" />
    15. <atom:category label="DSpace Item" scheme="http://www.dspace.org/objectModel/" term="DSpaceItem" />
    16. <atom:link href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" length="1831204" rel="http://www.openarchives.org/ore/terms/aggregates" title="6199-5608-1-PB.pdf" type="application/pdf" />
    17. <oreatom:triples>

      1. <rdf:Description about="http://hdl.handle.net/10641/2327/ore.xml#atom">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceItem" />
        2. <dcterms:modified>2021-06-16T08:19:29Z</dcterms:modified>

        </rdf:Description>

      2. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>ORIGINAL</dcterms:description>

        </rdf:Description>

      3. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>CC-LICENSE</dcterms:description>

        </rdf:Description>

      4. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>LICENSE</dcterms:description>

        </rdf:Description>

      5. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>TEXT</dcterms:description>

        </rdf:Description>

      6. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg">

        1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
        2. <dcterms:description>THUMBNAIL</dcterms:description>

        </rdf:Description>

      </oreatom:triples>

    </atom:entry>

qdc

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <qdc:qualifieddc schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">

    1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

    2. <dc:creator>Martínez García, Eva</dc:creator>

    3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

    4. <dc:creator>Morales Escudero, Javier</dc:creator>

    5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

    6. <dc:subject>Generation</dc:subject>

    7. <dc:subject>Hybrid</dc:subject>

    8. <dc:subject>Markov Chains</dc:subject>

    9. <dc:subject>Embeddings</dc:subject>

    10. <dc:subject>Similarity</dc:subject>

    11. <dcterms:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dcterms:abstract>

    12. <dcterms:dateAccepted>2021-06-16T08:19:29Z</dcterms:dateAccepted>

    13. <dcterms:available>2021-06-16T08:19:29Z</dcterms:available>

    14. <dcterms:created>2021-06-16T08:19:29Z</dcterms:created>

    15. <dcterms:issued>2020</dcterms:issued>

    16. <dc:type>article</dc:type>

    17. <dc:identifier>1135-5948</dc:identifier>

    18. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

    19. <dc:identifier>10.26342/2020-64-10</dc:identifier>

    20. <dc:language>eng</dc:language>

    21. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

    22. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

    23. <dc:rights>openAccess</dc:rights>

    24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

    25. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

    </qdc:qualifieddc>

rdf

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <rdf:RDF schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">

    1. <ow:Publication about="oai:ddfv.ufv.es:10641/2327">

      1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>

      2. <dc:creator>Martínez García, Eva</dc:creator>

      3. <dc:creator>Nogales Moyano, Alberto</dc:creator>

      4. <dc:creator>Morales Escudero, Javier</dc:creator>

      5. <dc:creator>García Tejedor, Álvaro José</dc:creator>

      6. <dc:subject>Generation</dc:subject>

      7. <dc:subject>Hybrid</dc:subject>

      8. <dc:subject>Markov Chains</dc:subject>

      9. <dc:subject>Embeddings</dc:subject>

      10. <dc:subject>Similarity</dc:subject>

      11. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>

      12. <dc:date>2021-06-16T08:19:29Z</dc:date>

      13. <dc:date>2021-06-16T08:19:29Z</dc:date>

      14. <dc:date>2020</dc:date>

      15. <dc:type>article</dc:type>

      16. <dc:identifier>1135-5948</dc:identifier>

      17. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>

      18. <dc:identifier>10.26342/2020-64-10</dc:identifier>

      19. <dc:language>eng</dc:language>

      20. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>

      21. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>

      22. <dc:rights>openAccess</dc:rights>

      23. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>

      24. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>

      </ow:Publication>

    </rdf:RDF>

xoai

Download XML

    <?xml version="1.0" encoding="UTF-8" ?>

  1. <metadata schemaLocation="http://www.lyncode.com/xoai http://www.lyncode.com/xsd/xoai.xsd">

    1. <element name="dc">

      1. <element name="contributor">

        1. <element name="author">

          1. <element name="none">

            1. <field name="value">Martínez García, Eva</field>

            2. <field name="authority">e4a4adb9-fa58-4a27-b114-f07bbf623ff7</field>

            3. <field name="confidence">600</field>

            4. <field name="value">Nogales Moyano, Alberto</field>

            5. <field name="authority">209</field>

            6. <field name="confidence">600</field>

            7. <field name="value">Morales Escudero, Javier</field>

            8. <field name="authority">3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba</field>

            9. <field name="confidence">600</field>

            10. <field name="value">García Tejedor, Álvaro José</field>

            11. <field name="authority">75</field>

            12. <field name="confidence">600</field>

            </element>

          </element>

        </element>

      2. <element name="date">

        1. <element name="accessioned">

          1. <element name="none">

            1. <field name="value">2021-06-16T08:19:29Z</field>

            </element>

          </element>

        2. <element name="available">

          1. <element name="none">

            1. <field name="value">2021-06-16T08:19:29Z</field>

            </element>

          </element>

        3. <element name="issued">

          1. <element name="none">

            1. <field name="value">2020</field>

            </element>

          </element>

        </element>

      3. <element name="identifier">

        1. <element name="issn">

          1. <element name="spa">

            1. <field name="value">1135-5948</field>

            </element>

          </element>

        2. <element name="uri">

          1. <element name="none">

            1. <field name="value">http://hdl.handle.net/10641/2327</field>

            </element>

          </element>

        3. <element name="doi">

          1. <element name="spa">

            1. <field name="value">10.26342/2020-64-10</field>

            </element>

          </element>

        </element>

      4. <element name="description">

        1. <element name="abstract">

          1. <element name="spa">

            1. <field name="value">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</field>

            </element>

          </element>

        2. <element name="version">

          1. <element name="spa">

            1. <field name="value">post-print</field>

            </element>

          </element>

        3. <element name="extent">

          1. <element name="spa">

            1. <field name="value">1,74 MB</field>

            </element>

          </element>

        </element>

      5. <element name="language">

        1. <element name="iso">

          1. <element name="spa">

            1. <field name="value">eng</field>

            </element>

          </element>

        </element>

      6. <element name="publisher">

        1. <element name="spa">

          1. <field name="value">Procesamiento del Lenguaje Natural</field>

          </element>

        </element>

      7. <element name="rights">

        1. <element name="*">

          1. <field name="value">Atribución-NoComercial-SinDerivadas 3.0 España</field>

          </element>

        2. <element name="uri">

          1. <element name="*">

            1. <field name="value">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</field>

            </element>

          </element>

        3. <element name="accessRights">

          1. <element name="spa">

            1. <field name="value">openAccess</field>

            </element>

          </element>

        </element>

      8. <element name="subject">

        1. <element name="spa">

          1. <field name="value">Generation</field>

          2. <field name="value">Hybrid</field>

          3. <field name="value">Markov Chains</field>

          4. <field name="value">Embeddings</field>

          5. <field name="value">Similarity</field>

          </element>

        </element>

      9. <element name="title">

        1. <element name="spa">

          1. <field name="value">A light method for data generation: a combination of Markov Chains and Word Embeddings.</field>

          </element>

        2. <element name="alternative">

          1. <element name="spa">

            1. <field name="value">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</field>

            </element>

          </element>

        </element>

      10. <element name="type">

        1. <element name="spa">

          1. <field name="value">article</field>

          </element>

        </element>

      11. <element name="relation">

        1. <element name="publisherversion">

          1. <element name="spa">

            1. <field name="value">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</field>

            </element>

          </element>

        </element>

      </element>

    2. <element name="bundles">

      1. <element name="bundle">

        1. <field name="name">ORIGINAL</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf</field>

            2. <field name="originalName">6199-5608-1-PB.pdf</field>

            3. <field name="description" />
            4. <field name="format">application/pdf</field>

            5. <field name="size">1831204</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</field>

            7. <field name="checksum">81f55f83adefa95b0a46222d72223778</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">1</field>

            </element>

          </element>

        </element>

      2. <element name="bundle">

        1. <field name="name">CC-LICENSE</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">license_rdf</field>

            2. <field name="originalName">license_rdf</field>

            3. <field name="format">application/rdf+xml; charset=utf-8</field>

            4. <field name="size">811</field>

            5. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf</field>

            6. <field name="checksum">4d01a8abc68801ab758ec8c2c04918c3</field>

            7. <field name="checksumAlgorithm">MD5</field>

            8. <field name="sid">2</field>

            </element>

          </element>

        </element>

      3. <element name="bundle">

        1. <field name="name">LICENSE</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">license.txt</field>

            2. <field name="originalName">license.txt</field>

            3. <field name="format">text/plain; charset=utf-8</field>

            4. <field name="size">2418</field>

            5. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt</field>

            6. <field name="checksum">8b6e3a0bc6a1ca51936267b0e6e4740c</field>

            7. <field name="checksumAlgorithm">MD5</field>

            8. <field name="sid">3</field>

            </element>

          </element>

        </element>

      4. <element name="bundle">

        1. <field name="name">TEXT</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf.txt</field>

            2. <field name="originalName">6199-5608-1-PB.pdf.txt</field>

            3. <field name="description">Extracted text</field>

            4. <field name="format">text/plain</field>

            5. <field name="size">30680</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</field>

            7. <field name="checksum">47b47b4ab230e10b1abda13a3bf7be5e</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">4</field>

            </element>

          </element>

        </element>

      5. <element name="bundle">

        1. <field name="name">THUMBNAIL</field>

        2. <element name="bitstreams">

          1. <element name="bitstream">

            1. <field name="name">6199-5608-1-PB.pdf.jpg</field>

            2. <field name="originalName">6199-5608-1-PB.pdf.jpg</field>

            3. <field name="description">Generated Thumbnail</field>

            4. <field name="format">image/jpeg</field>

            5. <field name="size">1595</field>

            6. <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg</field>

            7. <field name="checksum">edb12135decccbd5135dbda40a8589ad</field>

            8. <field name="checksumAlgorithm">MD5</field>

            9. <field name="sid">5</field>

            </element>

          </element>

        </element>

      </element>

    3. <element name="others">

      1. <field name="handle">10641/2327</field>

      2. <field name="identifier">oai:ddfv.ufv.es:10641/2327</field>

      3. <field name="lastModifyDate">2022-01-27 09:59:54.429</field>

      </element>

    4. <element name="repository">

      1. <field name="name">DDFV</field>

      2. <field name="mail">dspace@ufv.es</field>

      </element>

    5. <element name="license">

      1. <field name="bin">LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</field>

      </element>

    </metadata>

Hispana

Access portal to digital heritage and the national content aggregator to Europeana

Contact

Access our form and we will answer you as soon as possible

Contact

Twitter

Tweets by Hispana_roai

Facebook

HISPANA
© Ministerio de Cultura y Deporte
  • Legal notice
  • Accessibility