
➤  Gửi thông báo lỗi    ⚠️ Báo cáo tài liệu vi phạm

Loại tài liệu:     WORD
Số trang:         42 Trang
Tài liệu:           ✅  ĐÃ ĐƯỢC PHÊ DUYỆT

Nội dung chi tiết: Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12


Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12as accepted by Associate Editor Rajendra Srivastava.■ The authors are appreciative of the many useful comments of visiting editor Rajendra Srivasatava

and two anonymous reviewers.1 The Edgar data can be obtained at http.X', Assistant Professor Towson University. Professor Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

Rutgers University and Professor Rutgers University.1Extraction of Structure and Contentfrom the Edgar Database: A Template-Based ApproachAbstract: Th


is paper presents a template-based approach to extract data from the EDGAR database. A set of heuristic-based templates is used to configure the train

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ability and flexibility to this system. The template-based approach also enables the system to extract both structural information and content from th

e filings in the EDGAR database. The ability to extract structural information from a section or a complete filing makes it possible to collect data f Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

rom real-world documents tor users of financial data in both academia and industiy. We use the income statement section of 10-K filings to illustrate


the system and the utilization of the template-based approach.Keywords: EDGAR, document structure, knowledge engineering2INTRODUCTIONMotivationAdvance

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12s die preparation, dissemination and use of accounting information. Document structure determines the understandability, accessibility and retrieval p

recision of a digital document (Fisher 2004). In the accounting domain, table-like text bodies (mostly financial statements) located in financial repo Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

rts are the core vehicles of accounting information, and their structures are critical to the effective deliver}' of accounting information (e.g. Main


es and McDaniel, 2000). However, without a thorough examination of the diversified structures of financial statements used in real world, the required

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12 invites investigation into the structure of financial statements. For instance, Bovcc et al. (2002) show that the rigid structure adopted by the firs

t version ol XBRĨ. Taxonomy: Financial Reporting for Commercial and Industrial Companies - US GAAP (XBRL 2000) cannot accommodate the diversified stru Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

c tures of financial statements, partic ularly the income statement. Therefore, a thorough examination of the structures of financial statements used


in the real3https://khothuvien.cori!world can help the accounting profession to gain the insight in the diversity of the structures of financial state

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12of financial statements and reports in a digital format is extremely challenging. This fact in turn demonstrates the importance of a profound understa

nding of the structure of financial reports and the usefulness of such understanding to practitioners.Research projects in accounting such as FRAANK a Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

im at the extraction of accounting numbers but not their organization (structure) that includes grouping, sub-totals, etc. The extractions of Sữucture


s in accounting research were typically carried out manually on small collections of financial statements (e.g. Bovee et al. 2002). Analyses on (he st

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ng literature. However, such large-scale analyses are infeasible if not aided by computer programs that can automatically or semi-automatically extrac

t the structural information from financial statements. In this paper, we contend that such computer aided extraction of the structure of financial re Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

pons is attainable, and attempt to design a system that accomplishes the extraction tasks by employing a template-based approach.4https://khothuvien.c


ori!The Tasks and ChallengesThe technical difficulties of applying computer-aided analysis of the structure of financial reports is primarily posed by

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12n, 2003a 2003b) is maintained by the Securities and Exchange Commission (SEC). EDGAR is essentially the only free comprehensive source of electronic f

inancial reports, virtually all the analyses that require large number of financial reports6 use the electronic filings from the EDGAR database or val Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

ue-added tools based on EDGAR. Even though EDGAR has become the dominant source of financial reports to the general public, most of these financial re


ports are virtually unstructured free-form texts -- a format tlrat is extremely challenging for computer programs to parse and understand.The extracti

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12Such locating requires an understanding and extraction of the structure of the EDGAR filings. Only when the target block is located in the EDGAR filin

g, and die completeness and integrity of this block are preserved, the extraction of strticniral information from the block becomes feasible. When die Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

structure of die table-like text block is extracted, the strucniral details such as the relationships between its line items and the sub-lists or sub


-tables nested in die block must be captured. In addition,5Gerdes (2003) provides a thorough review of the EDGAR database.6EDGAR extraction provides “

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12TML format, but most filings are still in the free-form text format.5https://khothuvien.cori!the extraction of content such as financial numbers becom

es much easier when the structure of a table-like text block is extracted and understood. Therefore, two critical tasks in Ute structural extraction m Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12

ust be accomplished:1)at the document level, to understand the structure of an EDGAR filing, locate the target table-like text block and extract the c


omplete block with its integrity preserved,

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa

Gọi ngay
Chat zalo