Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
➤ Gửi thông báo lỗi ⚠️ Báo cáo tài liệu vi phạmNội dung chi tiết: Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12as accepted by Associate Editor Rajendra Srivastava.■ The authors are appreciative of the many useful comments of visiting editor Rajendra Srivasatava and two anonymous reviewers.1 The Edgar data can be obtained at http.X'edgar.sec.gov.4Respectively, Assistant Professor Towson University. Professor Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12Rutgers University and Professor Rutgers University.1Extraction of Structure and Contentfrom the Edgar Database: A Template-Based ApproachAbstract: ThMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
is paper presents a template-based approach to extract data from the EDGAR database. A set of heuristic-based templates is used to configure the trainExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ability and flexibility to this system. The template-based approach also enables the system to extract both structural information and content from the filings in the EDGAR database. The ability to extract structural information from a section or a complete filing makes it possible to collect data f Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12rom real-world documents tor users of financial data in both academia and industiy. We use the income statement section of 10-K filings to illustrateMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
the system and the utilization of the template-based approach.Keywords: EDGAR, document structure, knowledge engineering2INTRODUCTIONMotivationAdvanceExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12s die preparation, dissemination and use of accounting information. Document structure determines the understandability, accessibility and retrieval precision of a digital document (Fisher 2004). In the accounting domain, table-like text bodies (mostly financial statements) located in financial repo Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12rts are the core vehicles of accounting information, and their structures are critical to the effective deliver}' of accounting information (e.g. MainMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
es and McDaniel, 2000). However, without a thorough examination of the diversified structures of financial statements used in real world, the requiredExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12 invites investigation into the structure of financial statements. For instance, Bovcc et al. (2002) show that the rigid structure adopted by the first version ol XBRĨ. Taxonomy: Financial Reporting for Commercial and Industrial Companies - US GAAP (XBRL 2000) cannot accommodate the diversified stru Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12c tures of financial statements, partic ularly the income statement. Therefore, a thorough examination of the structures of financial statements usedMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
in the real3https://khothuvien.cori!world can help the accounting profession to gain the insight in the diversity of the structures of financial stateExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12of financial statements and reports in a digital format is extremely challenging. This fact in turn demonstrates the importance of a profound understanding of the structure of financial reports and the usefulness of such understanding to practitioners.Research projects in accounting such as FRAANK a Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12im at the extraction of accounting numbers but not their organization (structure) that includes grouping, sub-totals, etc. The extractions of SữuctureMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
s in accounting research were typically carried out manually on small collections of financial statements (e.g. Bovee et al. 2002). Analyses on (he stExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ng literature. However, such large-scale analyses are infeasible if not aided by computer programs that can automatically or semi-automatically extract the structural information from financial statements. In this paper, we contend that such computer aided extraction of the structure of financial re Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12pons is attainable, and attempt to design a system that accomplishes the extraction tasks by employing a template-based approach.4https://khothuvien.cMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
ori!The Tasks and ChallengesThe technical difficulties of applying computer-aided analysis of the structure of financial reports is primarily posed byExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12n, 2003a 2003b) is maintained by the Securities and Exchange Commission (SEC). EDGAR is essentially the only free comprehensive source of electronic financial reports, virtually all the analyses that require large number of financial reports6 use the electronic filings from the EDGAR database or val Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ue-added tools based on EDGAR. Even though EDGAR has become the dominant source of financial reports to the general public, most of these financial reMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
ports are virtually unstructured free-form texts -- a format tlrat is extremely challenging for computer programs to parse and understand.The extractiExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12Such locating requires an understanding and extraction of the structure of the EDGAR filings. Only when the target block is located in the EDGAR filing, and die completeness and integrity of this block are preserved, the extraction of strticniral information from the block becomes feasible. When die Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12 structure of die table-like text block is extracted, the strucniral details such as the relationships between its line items and the sub-lists or subMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
-tables nested in die block must be captured. In addition,5Gerdes (2003) provides a thorough review of the EDGAR database.6EDGAR extraction provides “Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper wa Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12TML format, but most filings are still in the free-form text format.5https://khothuvien.cori!the extraction of content such as financial numbers becomes much easier when the structure of a table-like text block is extracted and understood. Therefore, two critical tasks in Ute structural extraction m Mechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12ust be accomplished:1)at the document level, to understand the structure of an EDGAR filing, locate the target table-like text block and extract the cMechanical_Design_Criteria-Generic-Section_2A-6Arev.9-27-12
omplete block with its integrity preserved,Extraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper waExtraction of Structure and Content123from the Edgar Database: A Template-Based ApproachYu Cong'1 * * 4Miklos VasarhelyiAlexander Kogan1 This paper waGọi ngay
Chat zalo
Facebook