Hispana. Acceso en línea al Patrimonio Cultural Digital Español

Está en:  › Record data

A light method for data generation: a combination of Markov Chains and Word Embeddings.

Identificadores del recurso

1135-5948

http://hdl.handle.net/10641/2327

10.26342/2020-64-10

Origin

(Repositorio Institucional de la Universidad Francisco de Vitoria)

File

Title:: A light method for data generation: a combination of Markov Chains and Word Embeddings.; Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.
Tema:: Generation; Hybrid; Markov Chains; Embeddings; Similarity
Description:: Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.; post-print; 1,74 MB
Idioma:: English
Relation:: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199
Autor/Productor:: Martínez García, Eva; Nogales Moyano, Alberto; Morales Escudero, Javier; García Tejedor, Álvaro José
Publisher:: Procesamiento del Lenguaje Natural
Rights:: Atribución-NoComercial-SinDerivadas 3.0 España; http://creativecommons.org/licenses/by-nc-nd/3.0/es/; openAccess
Date:: 2021-06-16T08:19:29Z; 2020
Tipo de recurso:: article

oai_dc

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>
2. <dc:title>Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dc:title>
3. <dc:creator>Martínez García, Eva</dc:creator>
4. <dc:creator>Nogales Moyano, Alberto</dc:creator>
5. <dc:creator>Morales Escudero, Javier</dc:creator>
6. <dc:creator>García Tejedor, Álvaro José</dc:creator>
7. <dc:subject>Generation</dc:subject>
8. <dc:subject>Hybrid</dc:subject>
9. <dc:subject>Markov Chains</dc:subject>
10. <dc:subject>Embeddings</dc:subject>
11. <dc:subject>Similarity</dc:subject>
12. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>
13. <dc:description>post-print</dc:description>
14. <dc:description>1,74 MB</dc:description>
15. <dc:date>2021-06-16T08:19:29Z</dc:date>
16. <dc:date>2021-06-16T08:19:29Z</dc:date>
17. <dc:date>2020</dc:date>
18. <dc:type>article</dc:type>
19. <dc:identifier>1135-5948</dc:identifier>
20. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>
21. <dc:identifier>10.26342/2020-64-10</dc:identifier>
22. <dc:language>eng</dc:language>
23. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>
24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>
25. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
26. <dc:rights>openAccess</dc:rights>
27. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>
</oai_dc:dc>

didl

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<d:DIDL schemaLocation="urn:mpeg:mpeg21:2002:02-DIDL-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/did/didl.xsd">
1. <d:DIDLInfo>
  1. <dcterms:created schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/dcterms.xsd">2021-06-16T08:19:29Z</dcterms:created>
  </d:DIDLInfo>
2. <d:Item id="hdl_10641_2327">
  1. <d:Descriptor>
    1. <d:Statement mimeType="application/xml; charset=utf-8">
      1. <dii:Identifier schemaLocation="urn:mpeg:mpeg21:2002:01-DII-NS http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-21_schema_files/dii/dii.xsd">urn:hdl:10641/2327</dii:Identifier>
      </d:Statement>
    </d:Descriptor>
  2. <d:Descriptor>
    1. <d:Statement mimeType="application/xml; charset=utf-8">
      1. <oai_dc:dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
        <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>
        <dc:creator>Martínez García, Eva</dc:creator>
        <dc:creator>Nogales Moyano, Alberto</dc:creator>
        <dc:creator>Morales Escudero, Javier</dc:creator>
        <dc:creator>García Tejedor, Álvaro José</dc:creator>
        <dc:subject>Generation</dc:subject>
        <dc:subject>Hybrid</dc:subject>
        <dc:subject>Markov Chains</dc:subject>
        <dc:subject>Embeddings</dc:subject>
        <dc:subject>Similarity</dc:subject>
        <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>
        <dc:date>2021-06-16T08:19:29Z</dc:date>
        <dc:date>2021-06-16T08:19:29Z</dc:date>
        <dc:date>2020</dc:date>
        <dc:type>article</dc:type>
        <dc:identifier>1135-5948</dc:identifier>
        <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>
        <dc:identifier>10.26342/2020-64-10</dc:identifier>
        <dc:language>eng</dc:language>
        <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>
        <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
        <dc:rights>openAccess</dc:rights>
        <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>
        <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>
        </oai_dc:dc>
      </d:Statement>
    </d:Descriptor>
  3. <d:Component id="10641_2327_1">
    1. <d:Resource mimeType="application/pdf" ref="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" />
    </d:Component>
  </d:Item>
</d:DIDL>

dim

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<dim:dim schemaLocation="http://www.dspace.org/xmlns/dspace/dim http://www.dspace.org/schema/dim.xsd">
1. <dim:field authority="e4a4adb9-fa58-4a27-b114-f07bbf623ff7" confidence="600" element="contributor" mdschema="dc" qualifier="author">Martínez García, Eva</dim:field>
2. <dim:field authority="209" confidence="600" element="contributor" mdschema="dc" qualifier="author">Nogales Moyano, Alberto</dim:field>
3. <dim:field authority="3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba" confidence="600" element="contributor" mdschema="dc" qualifier="author">Morales Escudero, Javier</dim:field>
4. <dim:field authority="75" confidence="600" element="contributor" mdschema="dc" qualifier="author">García Tejedor, Álvaro José</dim:field>
5. <dim:field element="date" mdschema="dc" qualifier="accessioned">2021-06-16T08:19:29Z</dim:field>
6. <dim:field element="date" mdschema="dc" qualifier="available">2021-06-16T08:19:29Z</dim:field>
7. <dim:field element="date" mdschema="dc" qualifier="issued">2020</dim:field>
8. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="issn">1135-5948</dim:field>
9. <dim:field element="identifier" mdschema="dc" qualifier="uri">http://hdl.handle.net/10641/2327</dim:field>
10. <dim:field element="identifier" lang="spa" mdschema="dc" qualifier="doi">10.26342/2020-64-10</dim:field>
11. <dim:field element="description" lang="spa" mdschema="dc" qualifier="abstract">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dim:field>
12. <dim:field element="description" lang="spa" mdschema="dc" qualifier="version">post-print</dim:field>
13. <dim:field element="description" lang="spa" mdschema="dc" qualifier="extent">1,74 MB</dim:field>
14. <dim:field element="language" lang="spa" mdschema="dc" qualifier="iso">eng</dim:field>
15. <dim:field element="publisher" lang="spa" mdschema="dc">Procesamiento del Lenguaje Natural</dim:field>
16. <dim:field element="rights" lang="*" mdschema="dc">Atribución-NoComercial-SinDerivadas 3.0 España</dim:field>
17. <dim:field element="rights" lang="*" mdschema="dc" qualifier="uri">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dim:field>
18. <dim:field element="rights" lang="spa" mdschema="dc" qualifier="accessRights">openAccess</dim:field>
19. <dim:field element="subject" lang="spa" mdschema="dc">Generation</dim:field>
20. <dim:field element="subject" lang="spa" mdschema="dc">Hybrid</dim:field>
21. <dim:field element="subject" lang="spa" mdschema="dc">Markov Chains</dim:field>
22. <dim:field element="subject" lang="spa" mdschema="dc">Embeddings</dim:field>
23. <dim:field element="subject" lang="spa" mdschema="dc">Similarity</dim:field>
24. <dim:field element="title" lang="spa" mdschema="dc">A light method for data generation: a combination of Markov Chains and Word Embeddings.</dim:field>
25. <dim:field element="title" lang="spa" mdschema="dc" qualifier="alternative">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</dim:field>
26. <dim:field element="type" lang="spa" mdschema="dc">article</dim:field>
27. <dim:field element="relation" lang="spa" mdschema="dc" qualifier="publisherversion">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dim:field>
</dim:dim>

etdms

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<thesis schemaLocation="http://www.ndltd.org/standards/metadata/etdms/1.0/ http://www.ndltd.org/standards/metadata/etdms/1.0/etdms.xsd">
1. <title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</title>
2. <creator>Martínez García, Eva</creator>
3. <creator>Nogales Moyano, Alberto</creator>
4. <creator>Morales Escudero, Javier</creator>
5. <creator>García Tejedor, Álvaro José</creator>
6. <subject>Generation</subject>
7. <subject>Hybrid</subject>
8. <subject>Markov Chains</subject>
9. <subject>Embeddings</subject>
10. <subject>Similarity</subject>
11. <description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</description>
12. <date>2021-06-16</date>
13. <date>2021-06-16</date>
14. <date>2020</date>
15. <type>article</type>
16. <identifier>1135-5948</identifier>
17. <identifier>http://hdl.handle.net/10641/2327</identifier>
18. <identifier>10.26342/2020-64-10</identifier>
19. <language>eng</language>
20. <relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</relation>
21. <rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</rights>
22. <rights>openAccess</rights>
23. <rights>Atribución-NoComercial-SinDerivadas 3.0 España</rights>
24. <publisher>Procesamiento del Lenguaje Natural</publisher>
</thesis>

marc

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<record schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
1. <leader>00925njm 22002777a 4500</leader>
2. <datafield ind1=" " ind2=" " tag="042">
  1. <subfield code="a">dc</subfield>
  </datafield>
3. <datafield ind1=" " ind2=" " tag="720">
  1. <subfield code="a">Martínez García, Eva</subfield>
  2. <subfield code="e">author</subfield>
  </datafield>
4. <datafield ind1=" " ind2=" " tag="720">
  1. <subfield code="a">Nogales Moyano, Alberto</subfield>
  2. <subfield code="e">author</subfield>
  </datafield>
5. <datafield ind1=" " ind2=" " tag="720">
  1. <subfield code="a">Morales Escudero, Javier</subfield>
  2. <subfield code="e">author</subfield>
  </datafield>
6. <datafield ind1=" " ind2=" " tag="720">
  1. <subfield code="a">García Tejedor, Álvaro José</subfield>
  2. <subfield code="e">author</subfield>
  </datafield>
7. <datafield ind1=" " ind2=" " tag="260">
  1. <subfield code="c">2020</subfield>
  </datafield>
8. <datafield ind1=" " ind2=" " tag="520">
  1. <subfield code="a">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</subfield>
  </datafield>
9. <datafield ind1="8" ind2=" " tag="024">
  1. <subfield code="a">1135-5948</subfield>
  </datafield>
10. <datafield ind1="8" ind2=" " tag="024">
  1. <subfield code="a">http://hdl.handle.net/10641/2327</subfield>
  </datafield>
11. <datafield ind1="8" ind2=" " tag="024">
  1. <subfield code="a">10.26342/2020-64-10</subfield>
  </datafield>
12. <datafield ind1=" " ind2=" " tag="653">
  1. <subfield code="a">Generation</subfield>
  </datafield>
13. <datafield ind1=" " ind2=" " tag="653">
  1. <subfield code="a">Hybrid</subfield>
  </datafield>
14. <datafield ind1=" " ind2=" " tag="653">
  1. <subfield code="a">Markov Chains</subfield>
  </datafield>
15. <datafield ind1=" " ind2=" " tag="653">
  1. <subfield code="a">Embeddings</subfield>
  </datafield>
16. <datafield ind1=" " ind2=" " tag="653">
  1. <subfield code="a">Similarity</subfield>
  </datafield>
17. <datafield ind1="0" ind2="0" tag="245">
  1. <subfield code="a">A light method for data generation: a combination of Markov Chains and Word Embeddings.</subfield>
  </datafield>
</record>

mets

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<mets ID=" DSpace_ITEM_10641-2327" OBJID=" hdl:10641/2327" PROFILE="DSpace METS SIP Profile 1.0" TYPE="DSpace ITEM" schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd">
1. <metsHdr CREATEDATE="2022-09-20T09:27:37Z">
  1. <agent ROLE="CUSTODIAN" TYPE="ORGANIZATION">
    1. <name>DDFV</name>
    </agent>
  </metsHdr>
2. <dmdSec ID="DMD_10641_2327">
  1. <mdWrap MDTYPE="MODS">
    1. <xmlData schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
      1. <mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
        <mods:name>
        <mods:role>
        <mods:roleTerm type="text">author</mods:roleTerm>
        </mods:role>
        <mods:namePart>Martínez García, Eva</mods:namePart>
        </mods:name>
        <mods:name>
        <mods:role>
        <mods:roleTerm type="text">author</mods:roleTerm>
        </mods:role>
        <mods:namePart>Nogales Moyano, Alberto</mods:namePart>
        </mods:name>
        <mods:name>
        <mods:role>
        <mods:roleTerm type="text">author</mods:roleTerm>
        </mods:role>
        <mods:namePart>Morales Escudero, Javier</mods:namePart>
        </mods:name>
        <mods:name>
        <mods:role>
        <mods:roleTerm type="text">author</mods:roleTerm>
        </mods:role>
        <mods:namePart>García Tejedor, Álvaro José</mods:namePart>
        </mods:name>
        <mods:extension>
        <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>
        </mods:extension>
        <mods:extension>
        <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>
        </mods:extension>
        <mods:originInfo>
        <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>
        </mods:originInfo>
        <mods:identifier type="issn">1135-5948</mods:identifier>
        <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>
        <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>
        <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>
        <mods:language>
        <mods:languageTerm authority="rfc3066">eng</mods:languageTerm>
        </mods:language>
        <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>
        <mods:subject>
        <mods:topic>Generation</mods:topic>
        </mods:subject>
        <mods:subject>
        <mods:topic>Hybrid</mods:topic>
        </mods:subject>
        <mods:subject>
        <mods:topic>Markov Chains</mods:topic>
        </mods:subject>
        <mods:subject>
        <mods:topic>Embeddings</mods:topic>
        </mods:subject>
        <mods:subject>
        <mods:topic>Similarity</mods:topic>
        </mods:subject>
        <mods:titleInfo>
        <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>
        </mods:titleInfo>
        <mods:genre>article</mods:genre>
        </mods:mods>
      </xmlData>
    </mdWrap>
  </dmdSec>
3. <amdSec ID="TMD_10641_2327">
  1. <rightsMD ID="RIG_10641_2327">
    1. <mdWrap MDTYPE="OTHER" MIMETYPE="text/plain" OTHERMDTYPE="DSpaceDepositLicense">
      1. <binData>LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</binData>
      </mdWrap>
    </rightsMD>
  </amdSec>
4. <amdSec ID="FO_10641_2327_1">
  1. <techMD ID="TECH_O_10641_2327_1">
    1. <mdWrap MDTYPE="PREMIS">
      1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
        <premis:premis>
        <premis:object>
        <premis:objectIdentifier>
        <premis:objectIdentifierType>URL</premis:objectIdentifierType>
        <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</premis:objectIdentifierValue>
        </premis:objectIdentifier>
        <premis:objectCategory>File</premis:objectCategory>
        <premis:objectCharacteristics>
        <premis:fixity>
        <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
        <premis:messageDigest>81f55f83adefa95b0a46222d72223778</premis:messageDigest>
        </premis:fixity>
        <premis:size>1831204</premis:size>
        <premis:format>
        <premis:formatDesignation>
        <premis:formatName>application/pdf</premis:formatName>
        </premis:formatDesignation>
        </premis:format>
        </premis:objectCharacteristics>
        <premis:originalName>6199-5608-1-PB.pdf</premis:originalName>
        </premis:object>
        </premis:premis>
        </xmlData>
      </mdWrap>
    </techMD>
  </amdSec>
5. <amdSec ID="FT_10641_2327_4">
  1. <techMD ID="TECH_T_10641_2327_4">
    1. <mdWrap MDTYPE="PREMIS">
      1. <xmlData schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
        <premis:premis>
        <premis:object>
        <premis:objectIdentifier>
        <premis:objectIdentifierType>URL</premis:objectIdentifierType>
        <premis:objectIdentifierValue>http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</premis:objectIdentifierValue>
        </premis:objectIdentifier>
        <premis:objectCategory>File</premis:objectCategory>
        <premis:objectCharacteristics>
        <premis:fixity>
        <premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
        <premis:messageDigest>47b47b4ab230e10b1abda13a3bf7be5e</premis:messageDigest>
        </premis:fixity>
        <premis:size>30680</premis:size>
        <premis:format>
        <premis:formatDesignation>
        <premis:formatName>text/plain</premis:formatName>
        </premis:formatDesignation>
        </premis:format>
        </premis:objectCharacteristics>
        <premis:originalName>6199-5608-1-PB.pdf.txt</premis:originalName>
        </premis:object>
        </premis:premis>
        </xmlData>
      </mdWrap>
    </techMD>
  </amdSec>
6. <fileSec>
  1. <fileGrp USE="ORIGINAL">
    1. <file ADMID="FO_10641_2327_1" CHECKSUM="81f55f83adefa95b0a46222d72223778" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_1" ID="BITSTREAM_ORIGINAL_10641_2327_1" MIMETYPE="application/pdf" SEQ="1" SIZE="1831204">
      1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" type="simple" />
      </file>
    </fileGrp>
  2. <fileGrp USE="TEXT">
    1. <file ADMID="FT_10641_2327_4" CHECKSUM="47b47b4ab230e10b1abda13a3bf7be5e" CHECKSUMTYPE="MD5" GROUPID="GROUP_BITSTREAM_10641_2327_4" ID="BITSTREAM_TEXT_10641_2327_4" MIMETYPE="text/plain" SEQ="4" SIZE="30680">
      1. <FLocat LOCTYPE="URL" href="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt" type="simple" />
      </file>
    </fileGrp>
  </fileSec>
7. <structMap LABEL="DSpace Object" TYPE="LOGICAL">
  1. <div ADMID="DMD_10641_2327" TYPE="DSpace Object Contents">
    1. <div TYPE="DSpace BITSTREAM">
      1. <fptr FILEID="BITSTREAM_ORIGINAL_10641_2327_1" />
      </div>
    </div>
  </structMap>
</mets>

mods

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<mods:mods schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
1. <mods:name>
  1. <mods:namePart>Martínez García, Eva</mods:namePart>
  </mods:name>
2. <mods:name>
  1. <mods:namePart>Nogales Moyano, Alberto</mods:namePart>
  </mods:name>
3. <mods:name>
  1. <mods:namePart>Morales Escudero, Javier</mods:namePart>
  </mods:name>
4. <mods:name>
  1. <mods:namePart>García Tejedor, Álvaro José</mods:namePart>
  </mods:name>
5. <mods:extension>
  1. <mods:dateAvailable encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAvailable>
  </mods:extension>
6. <mods:extension>
  1. <mods:dateAccessioned encoding="iso8601">2021-06-16T08:19:29Z</mods:dateAccessioned>
  </mods:extension>
7. <mods:originInfo>
  1. <mods:dateIssued encoding="iso8601">2020</mods:dateIssued>
  </mods:originInfo>
8. <mods:identifier type="issn">1135-5948</mods:identifier>
9. <mods:identifier type="uri">http://hdl.handle.net/10641/2327</mods:identifier>
10. <mods:identifier type="doi">10.26342/2020-64-10</mods:identifier>
11. <mods:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</mods:abstract>
12. <mods:language>
  1. <mods:languageTerm>eng</mods:languageTerm>
  </mods:language>
13. <mods:accessCondition type="useAndReproduction">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</mods:accessCondition>
14. <mods:accessCondition type="useAndReproduction">openAccess</mods:accessCondition>
15. <mods:accessCondition type="useAndReproduction">Atribución-NoComercial-SinDerivadas 3.0 España</mods:accessCondition>
16. <mods:subject>
  1. <mods:topic>Generation</mods:topic>
  </mods:subject>
17. <mods:subject>
  1. <mods:topic>Hybrid</mods:topic>
  </mods:subject>
18. <mods:subject>
  1. <mods:topic>Markov Chains</mods:topic>
  </mods:subject>
19. <mods:subject>
  1. <mods:topic>Embeddings</mods:topic>
  </mods:subject>
20. <mods:subject>
  1. <mods:topic>Similarity</mods:topic>
  </mods:subject>
21. <mods:titleInfo>
  1. <mods:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</mods:title>
  </mods:titleInfo>
22. <mods:genre>article</mods:genre>
</mods:mods>

ore

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<atom:entry schemaLocation="http://www.w3.org/2005/Atom http://www.kbcafe.com/rss/atom.xsd.xml">
1. <atom:id>http://hdl.handle.net/10641/2327/ore.xml</atom:id>
2. <atom:link href="http://hdl.handle.net/10641/2327" rel="alternate" />
3. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml" rel="http://www.openarchives.org/ore/terms/describes" />
4. <atom:link href="http://hdl.handle.net/10641/2327/ore.xml#atom" rel="self" type="application/atom+xml" />
5. <atom:published>2021-06-16T08:19:29Z</atom:published>
6. <atom:updated>2021-06-16T08:19:29Z</atom:updated>
7. <atom:source>
  1. <atom:generator>DDFV</atom:generator>
  </atom:source>
8. <atom:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</atom:title>
9. <atom:author>
  1. <atom:name>Martínez García, Eva</atom:name>
  </atom:author>
10. <atom:author>
  1. <atom:name>Nogales Moyano, Alberto</atom:name>
  </atom:author>
11. <atom:author>
  1. <atom:name>Morales Escudero, Javier</atom:name>
  </atom:author>
12. <atom:author>
  1. <atom:name>García Tejedor, Álvaro José</atom:name>
  </atom:author>
13. <atom:category label="Aggregation" scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" />
14. <atom:category scheme="http://www.openarchives.org/ore/atom/modified" term="2021-06-16T08:19:29Z" />
15. <atom:category label="DSpace Item" scheme="http://www.dspace.org/objectModel/" term="DSpaceItem" />
16. <atom:link href="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf" length="1831204" rel="http://www.openarchives.org/ore/terms/aggregates" title="6199-5608-1-PB.pdf" type="application/pdf" />
17. <oreatom:triples>
  1. <rdf:Description about="http://hdl.handle.net/10641/2327/ore.xml#atom">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceItem" />
    2. <dcterms:modified>2021-06-16T08:19:29Z</dcterms:modified>
    </rdf:Description>
  2. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
    2. <dcterms:description>ORIGINAL</dcterms:description>
    </rdf:Description>
  3. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
    2. <dcterms:description>CC-LICENSE</dcterms:description>
    </rdf:Description>
  4. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
    2. <dcterms:description>LICENSE</dcterms:description>
    </rdf:Description>
  5. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
    2. <dcterms:description>TEXT</dcterms:description>
    </rdf:Description>
  6. <rdf:Description about="http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg">
    1. <rdf:type resource="http://www.dspace.org/objectModel/DSpaceBitstream" />
    2. <dcterms:description>THUMBNAIL</dcterms:description>
    </rdf:Description>
  </oreatom:triples>
</atom:entry>

qdc

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<qdc:qualifieddc schemaLocation="http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd http://dspace.org/qualifieddc/ http://www.ukoln.ac.uk/metadata/dcmi/xmlschema/qualifieddc.xsd">
1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>
2. <dc:creator>Martínez García, Eva</dc:creator>
3. <dc:creator>Nogales Moyano, Alberto</dc:creator>
4. <dc:creator>Morales Escudero, Javier</dc:creator>
5. <dc:creator>García Tejedor, Álvaro José</dc:creator>
6. <dc:subject>Generation</dc:subject>
7. <dc:subject>Hybrid</dc:subject>
8. <dc:subject>Markov Chains</dc:subject>
9. <dc:subject>Embeddings</dc:subject>
10. <dc:subject>Similarity</dc:subject>
11. <dcterms:abstract>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dcterms:abstract>
12. <dcterms:dateAccepted>2021-06-16T08:19:29Z</dcterms:dateAccepted>
13. <dcterms:available>2021-06-16T08:19:29Z</dcterms:available>
14. <dcterms:created>2021-06-16T08:19:29Z</dcterms:created>
15. <dcterms:issued>2020</dcterms:issued>
16. <dc:type>article</dc:type>
17. <dc:identifier>1135-5948</dc:identifier>
18. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>
19. <dc:identifier>10.26342/2020-64-10</dc:identifier>
20. <dc:language>eng</dc:language>
21. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>
22. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
23. <dc:rights>openAccess</dc:rights>
24. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>
25. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>
</qdc:qualifieddc>

rdf

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<rdf:RDF schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
1. <ow:Publication about="oai:ddfv.ufv.es:10641/2327">
  1. <dc:title>A light method for data generation: a combination of Markov Chains and Word Embeddings.</dc:title>
  2. <dc:creator>Martínez García, Eva</dc:creator>
  3. <dc:creator>Nogales Moyano, Alberto</dc:creator>
  4. <dc:creator>Morales Escudero, Javier</dc:creator>
  5. <dc:creator>García Tejedor, Álvaro José</dc:creator>
  6. <dc:subject>Generation</dc:subject>
  7. <dc:subject>Hybrid</dc:subject>
  8. <dc:subject>Markov Chains</dc:subject>
  9. <dc:subject>Embeddings</dc:subject>
  10. <dc:subject>Similarity</dc:subject>
  11. <dc:description>Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</dc:description>
  12. <dc:date>2021-06-16T08:19:29Z</dc:date>
  13. <dc:date>2021-06-16T08:19:29Z</dc:date>
  14. <dc:date>2020</dc:date>
  15. <dc:type>article</dc:type>
  16. <dc:identifier>1135-5948</dc:identifier>
  17. <dc:identifier>http://hdl.handle.net/10641/2327</dc:identifier>
  18. <dc:identifier>10.26342/2020-64-10</dc:identifier>
  19. <dc:language>eng</dc:language>
  20. <dc:relation>http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</dc:relation>
  21. <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
  22. <dc:rights>openAccess</dc:rights>
  23. <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>
  24. <dc:publisher>Procesamiento del Lenguaje Natural</dc:publisher>
  </ow:Publication>
</rdf:RDF>

xoai

Download XML

<?xml version="1.0" encoding="UTF-8" ?>

<metadata schemaLocation="http://www.lyncode.com/xoai http://www.lyncode.com/xsd/xoai.xsd">
1. <element name="dc">
  1. <element name="contributor">
    1. <element name="author">
      1. <element name="none">
        <field name="value">Martínez García, Eva</field>
        <field name="authority">e4a4adb9-fa58-4a27-b114-f07bbf623ff7</field>
        <field name="confidence">600</field>
        <field name="value">Nogales Moyano, Alberto</field>
        <field name="authority">209</field>
        <field name="confidence">600</field>
        <field name="value">Morales Escudero, Javier</field>
        <field name="authority">3c20c2c8-86d0-4953-8a3a-7290cdb9a0ba</field>
        <field name="confidence">600</field>
        <field name="value">García Tejedor, Álvaro José</field>
        <field name="authority">75</field>
        <field name="confidence">600</field>
        </element>
      </element>
    </element>
  2. <element name="date">
    1. <element name="accessioned">
      1. <element name="none">
        <field name="value">2021-06-16T08:19:29Z</field>
        </element>
      </element>
    2. <element name="available">
      1. <element name="none">
        <field name="value">2021-06-16T08:19:29Z</field>
        </element>
      </element>
    3. <element name="issued">
      1. <element name="none">
        <field name="value">2020</field>
        </element>
      </element>
    </element>
  3. <element name="identifier">
    1. <element name="issn">
      1. <element name="spa">
        <field name="value">1135-5948</field>
        </element>
      </element>
    2. <element name="uri">
      1. <element name="none">
        <field name="value">http://hdl.handle.net/10641/2327</field>
        </element>
      </element>
    3. <element name="doi">
      1. <element name="spa">
        <field name="value">10.26342/2020-64-10</field>
        </element>
      </element>
    </element>
  4. <element name="description">
    1. <element name="abstract">
      1. <element name="spa">
        <field name="value">Most of the current state-of-the-art Natural Language Processing (NLP) techniques are highly data-dependent. A significant amount of data is required for their training, and in some scenarios data is scarce. We present a hybrid method to generate new sentences for augmenting the training data. Our approach takes advantage of the combination of Markov Chains and word embeddings to produce high-quality data similar to an initial dataset. In contrast to other neural-based generative methods, it does not need a high amount of training data. Results show how our approach can generate useful data for NLP tools. In particular, we validate our approach by building Transformer-based Language Models using data from three different domains in the context of enriching general purpose chatbots.</field>
        </element>
      </element>
    2. <element name="version">
      1. <element name="spa">
        <field name="value">post-print</field>
        </element>
      </element>
    3. <element name="extent">
      1. <element name="spa">
        <field name="value">1,74 MB</field>
        </element>
      </element>
    </element>
  5. <element name="language">
    1. <element name="iso">
      1. <element name="spa">
        <field name="value">eng</field>
        </element>
      </element>
    </element>
  6. <element name="publisher">
    1. <element name="spa">
      1. <field name="value">Procesamiento del Lenguaje Natural</field>
      </element>
    </element>
  7. <element name="rights">
    1. <element name="*">
      1. <field name="value">Atribución-NoComercial-SinDerivadas 3.0 España</field>
      </element>
    2. <element name="uri">
      1. <element name="*">
        <field name="value">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</field>
        </element>
      </element>
    3. <element name="accessRights">
      1. <element name="spa">
        <field name="value">openAccess</field>
        </element>
      </element>
    </element>
  8. <element name="subject">
    1. <element name="spa">
      1. <field name="value">Generation</field>
      2. <field name="value">Hybrid</field>
      3. <field name="value">Markov Chains</field>
      4. <field name="value">Embeddings</field>
      5. <field name="value">Similarity</field>
      </element>
    </element>
  9. <element name="title">
    1. <element name="spa">
      1. <field name="value">A light method for data generation: a combination of Markov Chains and Word Embeddings.</field>
      </element>
    2. <element name="alternative">
      1. <element name="spa">
        <field name="value">Un método ligero de generación de datos: combinación entre Cadenas de Markov y Word Embeddings.</field>
        </element>
      </element>
    </element>
  10. <element name="type">
    1. <element name="spa">
      1. <field name="value">article</field>
      </element>
    </element>
  11. <element name="relation">
    1. <element name="publisherversion">
      1. <element name="spa">
        <field name="value">http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6199</field>
        </element>
      </element>
    </element>
  </element>
2. <element name="bundles">
  1. <element name="bundle">
    1. <field name="name">ORIGINAL</field>
    2. <element name="bitstreams">
      1. <element name="bitstream">
        <field name="name">6199-5608-1-PB.pdf</field>
        <field name="originalName">6199-5608-1-PB.pdf</field>
        <field name="description" />
        <field name="format">application/pdf</field>
        <field name="size">1831204</field>
        <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/1/6199-5608-1-PB.pdf</field>
        <field name="checksum">81f55f83adefa95b0a46222d72223778</field>
        <field name="checksumAlgorithm">MD5</field>
        <field name="sid">1</field>
        </element>
      </element>
    </element>
  2. <element name="bundle">
    1. <field name="name">CC-LICENSE</field>
    2. <element name="bitstreams">
      1. <element name="bitstream">
        <field name="name">license_rdf</field>
        <field name="originalName">license_rdf</field>
        <field name="format">application/rdf+xml; charset=utf-8</field>
        <field name="size">811</field>
        <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/2/license_rdf</field>
        <field name="checksum">4d01a8abc68801ab758ec8c2c04918c3</field>
        <field name="checksumAlgorithm">MD5</field>
        <field name="sid">2</field>
        </element>
      </element>
    </element>
  3. <element name="bundle">
    1. <field name="name">LICENSE</field>
    2. <element name="bitstreams">
      1. <element name="bitstream">
        <field name="name">license.txt</field>
        <field name="originalName">license.txt</field>
        <field name="format">text/plain; charset=utf-8</field>
        <field name="size">2418</field>
        <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/3/license.txt</field>
        <field name="checksum">8b6e3a0bc6a1ca51936267b0e6e4740c</field>
        <field name="checksumAlgorithm">MD5</field>
        <field name="sid">3</field>
        </element>
      </element>
    </element>
  4. <element name="bundle">
    1. <field name="name">TEXT</field>
    2. <element name="bitstreams">
      1. <element name="bitstream">
        <field name="name">6199-5608-1-PB.pdf.txt</field>
        <field name="originalName">6199-5608-1-PB.pdf.txt</field>
        <field name="description">Extracted text</field>
        <field name="format">text/plain</field>
        <field name="size">30680</field>
        <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/4/6199-5608-1-PB.pdf.txt</field>
        <field name="checksum">47b47b4ab230e10b1abda13a3bf7be5e</field>
        <field name="checksumAlgorithm">MD5</field>
        <field name="sid">4</field>
        </element>
      </element>
    </element>
  5. <element name="bundle">
    1. <field name="name">THUMBNAIL</field>
    2. <element name="bitstreams">
      1. <element name="bitstream">
        <field name="name">6199-5608-1-PB.pdf.jpg</field>
        <field name="originalName">6199-5608-1-PB.pdf.jpg</field>
        <field name="description">Generated Thumbnail</field>
        <field name="format">image/jpeg</field>
        <field name="size">1595</field>
        <field name="url">http://ddfv.ufv.es/bitstream/10641/2327/5/6199-5608-1-PB.pdf.jpg</field>
        <field name="checksum">edb12135decccbd5135dbda40a8589ad</field>
        <field name="checksumAlgorithm">MD5</field>
        <field name="sid">5</field>
        </element>
      </element>
    </element>
  </element>
3. <element name="others">
  1. <field name="handle">10641/2327</field>
  2. <field name="identifier">oai:ddfv.ufv.es:10641/2327</field>
  3. <field name="lastModifyDate">2022-01-27 09:59:54.429</field>
  </element>
4. <element name="repository">
  1. <field name="name">DDFV</field>
  2. <field name="mail">dspace@ufv.es</field>
  </element>
5. <element name="license">
  1. <field name="bin">LSBFbCByZXBvc2l0b3JpbyBpbnN0aXR1Y2lvbmFsIGRlIGxhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIGRlIE1hZHJpZCAoRERGViksIHBvbmUgYSBkaXNwb3NpY2nDs24gZGUgbG9zIHVzdWFyaW9zIGxhIHBsYXRhZm9ybWEgZGlnaXRhbCBhYmllcnRhIHkgZGUgYWNjZXNvIGxpYnJlIGRlIGxhIHByb2R1Y2Npw7NuIGNpZW50w61maWNhIGRlIGxhIGluc3RpdHVjacOzbi4KCi0gQSB0YWxlcyBmaW5lcywgbG9zIGF1dG9yZXMgZGVjbGFyYW4gcXVlIHNvbiB0aXR1bGFyZXMgZGUgbG9zIGRlcmVjaG9zIGRlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCBkZSBsYSBvYnJhIHkgcXVlIMOpc3RhIGVzIG9yaWdpbmFsLgoKLSBNZWRpYW50ZSBsYSBhY2VwdGFjacOzbiBkZSBlc3RhIGxpY2VuY2lhLCBlbCBhdXRvciwgY29tbyB0aXR1bGFyIGRlIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgYXV0b3JpemEgeSBjZWRlIGEgbGEgVW5pdmVyc2lkYWQgRnJhbmNpc2NvIGRlIFZpdG9yaWEsIGRlIGZvcm1hIGdyYXR1aXRhIHkgbm8gZXhjbHVzaXZhLCBwb3IgZWwgbcOheGltbyBwbGF6byBsZWdhbCB5IGNvbiDDoW1iaXRvIHVuaXZlcnNhbCwgbG9zIGRlcmVjaG9zIGRlIHJlcHJvZHVjY2nDs24sIGRpc3RyaWJ1Y2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIGluY2x1aWRvIGVsIGRlcmVjaG8gZGUgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGVsZWN0csOzbmljYSwgeSBsYSB0cmFuc2Zvcm1hY2nDs24gZGUgZm9ybWF0byBzb2JyZSBsYSBvYnJhIGluZGljYWRhLCBzaSBmdWVyYSBlbCBjYXNvLgoKLSBFbiBlbCBjYXNvIGRlIGNlc2nDs24gZGUgZGVyZWNob3MgZGUgZXhwbG90YWNpw7NuIGEgdGVyY2Vyb3MsIGRlY2xhcmEgcXVlIGN1ZW50YSBjb24gbGEgYXV0b3JpemFjacOzbiBkZSBkaWNob3MgdGl0dWxhcmVzIHkgcXVlIGhhIG9idGVuaWRvIGVsIHBlcm1pc28gc2luIHJlc3RyaWNjaW9uZXMgZGVsIHByb3BpZXRhcmlvIGRlbCBjb3B5cmlnaHQgcGFyYSBvdG9yZ2FyIGEgbGEgaW5zdGl0dWNpw7NuIGxvcyBkZXJlY2hvcyByZXF1ZXJpZG9zIHBhcmEgZXN0YSBsaWNlbmNpYSB5IHF1ZSBkaWNobyBwcm9waWV0YXJpbyBjb25vY2UgZWwgdGV4dG8gbyBlbCBjb250ZW5pZG8gZGUgbGEgb2JyYS4KCi0gU2kgZnVlcmEgdW5hIG9icmEgcGF0cm9jaW5hZGEgcG9yIGFsZ3VuYSBpbnN0aXR1Y2nDs24gZGlzdGludGEgYSBsYSBVbml2ZXJzaWRhZCBGcmFuY2lzY28gZGUgVml0b3JpYSwgZGVjbGFyYSBxdWUgZW4gY2FzbyBuZWNlc2FyaW8sIGN1ZW50YSBjb24gbG9zIHBlcm1pc29zIHBlcnRpbmVudGVzLCBkZSBsYSBpbnN0aXR1Y2nDs24gbyBlbnRpZGFkLCBxdWUgbGUgcGVybWl0YW4gbGEgZGlmdXNpw7NuIGRlIGRpY2hhIG9icmEuCgotIExhIFVuaXZlcnNpZGFkIEZyYW5jaXNjbyBkZSBWaXRvcmlhIG5vIHRpZW5lIGxhIHRpdHVsYXJpZGFkIGRlIGxvcyBkZXJlY2hvcyBzb2JyZSBsYSBvYnJhLCBxdWUgY29ycmVzcG9uZGVuIGFsIGF1dG9yLCBwZXJvIHNpbiBlbWJhcmdvIMOpc3RhIGxpY2VuY2lhIGRhIGRlcmVjaG8gYSByZXByb2R1Y2lybGEgZW4gdW4gc29wb3J0ZSBkaWdpdGFsLCBkaXN0cmlidWlyIGEgbG9zIHVzdWFyaW9zIGNvcGlhcyBlbGVjdHLDs25pY2FzIGRlIGxhIG9icmEgZW4gZm9ybWF0byBkaWdpdGFsLCBjb211bmljYWNpw7NuIHDDumJsaWNhIHkgc3UgcHVlc3RhIGEgZGlzcG9zaWNpw7NuIGEgdHJhdsOpcyBkZSB1biBhcmNoaXZvIGFiaWVydG8gaW5zdGl0dWNpb25hbC4KCi0gTGEgb2JyYSBzZSBwb25kcsOhIGEgZGlzcG9zaWNpw7NuIGRlIGxvcyB1c3VhcmlvcyBwYXJhIHF1ZSBoYWdhbiBkZSBlbGxhIHVuIHVzbyBqdXN0byB5IHJlc3BldHVvc28gY29uIGxvcyBkZXJlY2hvcyBkZSBhdXRvciwgc2VhIGNvbiBmaW5lcyBkZSBlc3R1ZGlvLCBpbnZlc3RpZ2FjacOzbiBvIGN1YWxxdWllciBvdHJvIGZpbiBsw61jaXRvLCB5IGRlIGFjdWVyZG8gYSBsYXMgY29uZGljaW9uZXMgZXN0YWJsZWNpZGFzIGVuIGxhIGxpY2VuY2lhIENyZWF0aXZlIENvbW1vbnMsIGRlIG1vZG8gcXVlIGxhcyBvYnJhcyBwdWVkYW4gc2VyIGRpc3RyaWJ1aWRhcywgY29waWFkYXMgeSBleGhpYmlkYXMgc2llbXByZSBxdWUgc2UgY2l0ZSBsYSBhdXRvcsOtYSB5IG5vIHNlIG9idGVuZ2EgYmVuZWZpY2lvIGNvbWVyY2lhbC4gUG9yIHRhbnRvLCBsYSBVbml2ZXJzaWRhZCBubyBhc3VtaXLDoSByZXNwb25zYWJpbGlkYWQgYWxndW5hIHBvciBsYSBmb3JtYSBlZmVjdGl2YSBlbiBxdWUgbG9zIHVzdWFyaW9zIHV0aWxpY2VuIGVsIG1hdGVyaWFsIHB1ZXN0byBhIHN1IGRpc3Bvc2ljacOzbi4KCi0gRWwgYXV0b3IgcG9kcsOhIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBsYSBvYnJhIGRlbCByZXBvc2l0b3JpbyBwb3IgY2F1c2EganVzdGlmaWNhZGEuIAoK</field>
  </element>
</metadata>