記述のカテゴリ分け
もとの詳細な記述を分析して、下位のカテゴリに分類し、下位のカテゴリをさらに上位カテゴリに再分類する。
2階層のブレイクダウンの逆。
ルールは辞書で作成。keyにあたる部分がカテゴリ名、valueにはlistを設定する。
そのリストにある記述が、もとの詳細な記述の中にあれば、そのカテゴリであると判断する。
下位カテゴリが複数のカテゴリに該当する場合、もとの詳細な記述の最初に書いてあるカテゴリとする。
結果の見方は、「もとの詳細な記述 下位カテゴリ 上位カテゴリ」
結果
AA123 A1 A AA123B22 A1 A B22AA123 B2 B asdf None None asC12df C1 None
ソース
def classify_category(source_descripton,rule_dic): category_found = False found_key= "" find_pos=100 # 十分な長さ for category_key in rule_dic.keys(): for category_value in rule_dic[category_key]: if category_value in source_descripton: category_found = True tmp_pos=source_descripton.find(category_value) if tmp_pos < find_pos: found_key = category_key find_pos=tmp_pos if category_found is False: found_key="None" return found_key if __name__ == '__main__': top_category={} second_category={} top_category["A"]=["A1","A2"] top_category["B"]=["B1","B2"] second_category["A1"]=["A11","A12"] second_category["A2"]=["A21","A22"] second_category["B1"]=["B11","B12"] second_category["B2"]=["B21","B22"] second_category["C1"]=["C11","C12"] source_descripton="AA123" second_category_description = classify_category(source_descripton,second_category) top_category_description = classify_category(second_category_description,top_category) print(source_descripton,second_category_description,top_category_description) source_descripton="AA123B22" second_category_description = classify_category(source_descripton,second_category) top_category_description = classify_category(second_category_description,top_category) print(source_descripton,second_category_description,top_category_description) source_descripton="B22AA123" second_category_description = classify_category(source_descripton,second_category) top_category_description = classify_category(second_category_description,top_category) print(source_descripton,second_category_description,top_category_description) source_descripton="asdf" second_category_description = classify_category(source_descripton,second_category) top_category_description = classify_category(second_category_description,top_category) print(source_descripton,second_category_description,top_category_description) source_descripton="asC12df" second_category_description = classify_category(source_descripton,second_category) top_category_description = classify_category(second_category_description,top_category) print(source_descripton,second_category_description,top_category_description)