tika获取文件内容(去年写的,记录下)

Shaka 5月前 ⋅ 348 阅读
public static String getBodyText(File f){
//1、创建一个parser
InputStream is = null;
ContentHandler handler = new BodyContentHandler(1024*1024*10);
try {
is = new FileInputStream(f);
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
context.set(Parser.class,parser);

//2、执行parser的parse()方法。
parser.parse(is, handler, metadata, context);
//System.out.println("Contents of the document:" + handler.toString());
} catch (Exception e) {
logger.info(e.getMessage());
e.printStackTrace();
}finally {
try {
if(is!=null)
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return handler.toString();
}

注意:本文归作者所有,未经作者允许,不得转载

全部评论: 0

    我有话说: