Nazar Kanaev
2a4d974965
go fmt
2024-10-07 12:20:45 +01:00
Karol Kosek
b9b3d2350c
atom: Stop unescaping special HTML characters
...
The HTML data in Atom is escaped because the data needs to put as a
string to an XML file. If we are accessing it by reading the string
value, then it is already unescaped, as opposed to getting the raw
XML data.
XHTML data don't need to be unescaped either since the elements are
already encoded as is in tree. :)
Closes #198
2024-06-16 11:35:32 +01:00
Will Harding
3adcddc70c
Pull atom xhtml title from nested elements
...
The Atom spec says that any title marked with a type of "xhtml" should be
contained in a div element[1] so we need to use the full XML text when
extracting the text.
[1] https://www.rfc-editor.org/rfc/rfc4287#section-3.1
2023-09-23 21:08:22 +01:00
Nazar Kanaev
850ce195a0
fix atom links
2023-09-07 18:19:17 +01:00
Nazar Kanaev
bc18557820
handle isPermalink in rss feeds
2023-05-20 23:26:22 +01:00
Pierre Prinetti
c1bcc0c517
Run go fmt
...
This patch is the result of running `go fmt ./...` with Go v1.16.15.
2022-07-04 15:20:49 +01:00
Nazar Kanaev
ee2a825cf0
get rss link when atom link is present
...
found in: https://rss.nytimes.com/services/xml/rss/nyt/Arts.xml
when both rss and atom link elements are present, xml parser returns
empty string. provide default namespace to capture rss link properly.
2022-05-03 15:35:57 +01:00
Nazar Kanaev
be7af0ccaf
handle invalid chars in non-utf8 xml
2022-02-14 15:23:55 +00:00
Nazar Kanaev
18221ef12d
use bytes.Buffer instead
2022-02-14 11:05:38 +00:00
Nazar Kanaev
d7253a60b8
strip out invalid xml characters
2022-02-12 23:42:44 +00:00
Nazar Kanaev
2de3ddff08
fix test
2022-02-12 23:41:01 +00:00
nkanaev
52cc8ecbbd
fix encoding
2022-01-24 16:47:32 +00:00
nkanaev
bff7476b58
refactoring
2022-01-24 12:50:52 +00:00
nkanaev
26b87dee98
remove html tags from titles
2021-11-10 10:54:12 +00:00
Karol Kosek
19ecfcd0bc
ParseRSS: accept any file with audio/ media type as podcast
...
There are some podcasts that use audio/opus files (mostly as an alternative,
but still), which makes the audio attachment not being displayed.
Instead of increasing the list of allowed formats (because audio/mp3 would be
quite useful on the list too), I guess it'd be better to give any audio/ media
type to the user-agent and let him worry about it. :^)
2021-07-28 09:31:27 +01:00
Nazar Kanaev
d203d38de6
fix empty feed parsing
2021-07-01 14:10:22 +01:00
Nazar Kanaev
e54df07a40
use rdf description
2021-04-15 10:29:35 +01:00
Nazar Kanaev
f8455236dc
rdf date & content
2021-04-15 10:27:50 +01:00
Nazar Kanaev
fbb0dfed47
remove bom
2021-04-07 10:25:30 +01:00
Nazar Kanaev
144fc1606a
remove feed hacks from storage
2021-04-05 20:59:15 +01:00
Nazar Kanaev
fa2fad0ff6
cleanup
2021-04-05 10:01:20 +01:00
Nazar Kanaev
63ad971890
unsset audio/image if present in the content
2021-04-04 21:31:25 +01:00
Nazar Kanaev
0828d6782e
extract date parser to a new file
2021-04-04 20:45:13 +01:00
Nazar Kanaev
cf5856bdf7
set missing times
2021-04-04 20:42:52 +01:00
Nazar Kanaev
e50c7e1a51
handle html type atom text
2021-04-02 22:26:45 +01:00
Nazar Kanaev
0a0db68905
feedburner
2021-04-02 22:26:45 +01:00
Nazar Kanaev
36bc84d99a
increase lookup length
2021-04-02 22:26:44 +01:00
Nazar Kanaev
7dbfecdba1
extract thumbnails from vimeo feeds
2021-04-02 22:26:44 +01:00
Nazar Kanaev
fafa6286d4
parser fixes
2021-04-02 22:26:44 +01:00
Nazar Kanaev
cc51fe01c2
give priority to content:encoded
2021-04-02 22:26:44 +01:00
Nazar Kanaev
51cbdea31f
podcasts
2021-04-02 22:26:44 +01:00
Nazar Kanaev
6685bce51c
extract data from media elements
2021-04-02 22:26:44 +01:00
Nazar Kanaev
e0e6166cdf
fix feed sniff reader
2021-04-02 22:26:44 +01:00
Nazar Kanaev
c469749eaa
rename packaages
2021-04-02 22:26:44 +01:00