This post is also available in: 日本語 (Japanese)
When dealing with finance data in python3, when I try to convert it to float type with the astype(float) function, an error may occasionally occur due to the minus sign in the data.
ValueError: could not convert string to float: '−12'
This is because there are some symbols that represent minus, so I wrote how to deal with it.
Contents
There are several symbols that represent the minus sign.
The minus sign entered from the familiar keyboard becomes the following symbol.
It seems to be called HYPHEN-MINUS (U+002D).
import unicodedata # HYPHEN-MINUS print(unicodedata.name("-")) # HYPHEN-MINUS print("-".encode('utf-8')) # b'-' print(b'-'.decode('utf-8')) # -
However, there are some negative signs.
The following seems to be called MINUS SIGN (U+2212).
import unicodedata # MINUS SIGN print(unicodedata.name("−")) # MINUS SIGN print("−".encode('utf-8')) # b'\xe2\x88\x92' print(b'\xe2\x88\x92'.decode('utf-8')) # −
If you try to convert a number with MINUS SIGN(U+2212) to float type with the astype(float) function, you will get an error "ValueError: could not convert string to float" .
Sample code to convert a number with a minus sign(MINUS SIGN) to float type
This is a sample code that replaces sample data with "MINUS SIGN" as a minus sign with "HYPHEN-MINUS" and converts it to float type.
I think most financial data have percentages, so the sample data also has percentages.
import pandas as pd # Sample data with MINUS SIGN sample = [["−12%","10%","0%"],["−8%","−4%","5%"]] df = pd.DataFrame(sample) print(df) """ 0 1 2 0 −12% 10% 0% 1 −8% −4% 5% """ # Loop through column names in order # Replace % and MINUS SIGN, and convert to float type for i in df.columns: df[i] = df[i].str.replace('%','').str.replace('−','-').astype(float) print(df) """ 0 1 2 0 -12.0 10.0 0.0 1 -8.0 -4.0 5.0 """No tags for this post.